QoS Question

Hello all. I can't for the life of me seem to understand this, or find someone who can explain this.



If I am doing some LLQ and have a priority queue with 33% bandwidth guaranteed (3.33 mbps), but the queue limit is 64 packets - how does this really do me any good? I'm terribly confused! I know queuing only occurs during congestion (can anyone explain how I can determine when that point is?), but let us pretend the 10 mbps circuit is maxed out. Say a boat load of traffic comes into the router while this thing is maxed out, and packets start to get queued. Well, if all queues are 64 packets at most, and a packet is at most 12000 bits, that means a queue holds roughly 750000 bits of data at most (or 0.75 mb - but in reality 64 voice packets is only going to be around 41000 bits). If there's just 3 queues, that's only 2.25 mb totally accounted for. These queues can be dumped at sub-second rates, so they can be filled multiple times within a second I imagine - but I am having a hard time understanding this. Especially when a default queue gets 61% remaining BW, and a priority queue only gets 33% guaranteed - how the hell does this actually work out when the queue limit is exactly the same size? If everything starts getting queued up and the default class should use 61% of the remaining bandwidth, shouldn't it need to hold more packets?


I'm really hopeful someone can make sense of this - it's got me going crazy!


The only way I can see this making sense is that the link is maxed out, the router knows 80% is default class currently in the hardware queue, and it is willing to empty out the software priority queue enough so that additional voice traffic can get sent out (up to 3.33 mbps). I don't know - maybe I've gone off the deep end here lol. It's just kind of funny that queues are measured in packets, but packets vary in size and you are really trying to just gurantee bandwidth, and 3.33 mbps  essentially equates to 38 people making outgoing calls with G.711 encoding guranteed, and that's going to equate to 1900 packets per second but the hardware queue is only 1000 packets? But obviously that isn't 1000 packets per second, the 2911 can forward way more packets per second (up to 183,000 if all packets are 64 bytes I believe) which allows for 180 mbps, which is why we need to use much beefier ASR 1000 series routers on the MPLS head end.


This stuff just happens so fast (milliseconds) that I guess it's hard to imagine/comprehend?


  • As I often say when I'm teaching BGP, "think like a router, son!"

    Slow down and imagine yourself as a router with a 100G port (i.e. mega-FAT pipe) and a 10M pipe. In other words, the 100G port can so easily fill the queues, it's not funny. Create three buckets you'll use to visualize queues. Drill a hole in the bottom for the packets to drain out, but keep something handy to plug the hole (you're going to need it).

    Now, send a packet in via the 100G port. Deposit it in the correct queue. Service the packet by transmitting it out the 10M port. Calculate how many miilisecond it'll take to service the packet, and ignore the "milli" for the sake of visualization: pretend that the port is now "busy" for X seconds (instead of X milliseconds). Guess what? More packets arrive. Assign them to the proper queues (assume the individual queues operate in FIFO fashion).

    Now, your job is to service the queues based on how you're configured. Once X (milli)seconds go by, decide which queue to service next. If the rule says 33.3% bandwidth guaranteed, you can service that queue such that it keeps the 10M port (egress) busy for 33.3% of the time. Otherwise, you need to service the other queue(s) as you're configured. Remember that the priority queue gets priority, so if you happen to drain the priority queue at one particular moment but have packets waiting in multiple other queues, you're only going to service one of those non-priority queues before rechecking the priority queue for any queued packets (and service them, if under 33.3% usage and congested, or regardless if uncongested).

    The nutshell is that the queues will be draining at a finite rate (very predictable if every packet in that queue is constant size, not so predictable otherwise). Once that becomes clear, your risk is not the overall inbound packet rate, but the inbound packet rate on a per-queue basis: if too many packets come in for a particular queue, they'll spill out over the sides of the bucket and get lost.

    There's also the reality that a VoIP phone call or a videoconference stream is NOT bursty: time must elapse for the next VoIP packet to be created, as it's dependent on real-time audio for the data it needs to shove into the packet. So (at least at a per-flow basis) it's relatively impossible for an onslaught of priority packets to arrive; they're far more likely to arrive at an even pace. If your fill rate is constant and your drain rate is constant, your risk only appears if your fill rate exceeds the drain rate (and I'd argue that the queue size is irrelevant then: you were destined to drop packets, it was merely a question of when).

    Also note that if traffic subsides to perhaps 8M sustained, you can easily service all of your queues, and at that point you can service the priority queue more than 33.3% of the time.

  • Peety - thank you for helping me!


    I stepped back after posting this and taking a break, and it started to make some sense to me. Your explanation confirms and clears up any doubts I had. It is definitely a tough thing to process because of how fast packets are moving. It's funny at first seeing a small queue limit of 64 packets and just thinking "that's it???" but I am able to somewhat see how this is working.


    Your last point mentions that you can service the priority queue at more than 33.33% of the time if traffic were to subside to 8 mbps.  My understanding was that queuing only occurs during periods of congestion. My question then really is this - Is my previous statement regarding queuing correct? If so, how am I able to determine what is considered congestion? I am having a hard time understanding the tx-ring, if the answer is related to that. It would appear that the physical egress interface allows for 1000 packets, based off of a "show interface brief gig[x/x]"


    Thank you again!

  • Once you dig around long enough, you'll realize that every platform has a very specific set of switching mechanisms (CEF is a good high-level example, but there are details under the hood as well). The reality is that queueing has to happen for every packet, in order to make the switching path cleaner and simpler. For example: if you're in the middle of transmitting a packet, and another packet arrives, what will you do with it? Answer is you need to queue it. There's a lot more deeper stuff to that depending on the platform. So for the sake of cleaner handling, queueing is in effect all the time, it's just possible that the queue never builds.

    The TX ring is typically a two-packet buffer that sits between the broader queueing activities and the low-level NIC driver. Packets actually flow from the queueing that you configure into the TX ring, so that the system doesn't have to handle an interrupt EVERY time and can quickly grab a packet to send out the wire/fiber. You could possibly shorten it to avoid any FIFO bottlenecks that could occur there (driving up CPU) or you could perhaps lengthen it if CPU contention was a problem, but I'd almost universally leave it at 2.

    Also, remember that any given port is technically always transmitting at line rate; it just could be transmitting <null>. A packet of X bytes at line rate Y will always take Z <micro|milli> seconds to send. You may want to calculate packets-per-second for several common interface types (for 64 / 300 / 1500 / 9000-byte packets) and/or the corresponding transmit duration timings so you can have them handy. A lot goes on, but the reality is that a given router or switch just does what it's told (and very fast), so it follows these routines no matter how busy or bored it might be.

  • To elaborate a bit on my comment about switching mechanisms, I'll tell you about what "I had to learn" when I got into the ISP business "for real" back in 2003 and then migrated that company's network to Cisco gear starting in 2004. I'll do this in a slightly more logical sequence than I learned it, to help you follow the flow a bit better.

    The 7200 series routers are classic workhorses: solid, reliable, simple. It's modular in that it has six Port Adapter (PA) slots (you'll hear more about PAs later...), plus a slot for an I/O controller (originally for console/aux/flash, later added some Ethernet ports) and a back-side slot for a Network Processing Engine (NPE, which is the brains and forwarding engine of the box). PAs are hot-swappable, IO and NPE are not. These will run forever, but the NPE is a distinct limit to packets-per-second forwarding rate and a NPE swap is necessary to get more horsepower. Packets come in on the PAs, go to the NPE for forwarding, and go out the PAs, period.

    The 7500 series routers had a lot of promise, but I'd argue that they never delivered. Take for example the 7507: if I remember correctly, there were two slots for Route Switch Processors (RSP) and five slots for Interface Processors (IPs). In the beginning, it was basically a bigger edition of the 7200: IPs were bigger than PAs, RSPs were different and beefier than NPEs. Packets came in on the IPs, go to the RSP for forwarding, and go out the IPs, and that was it. Cisco later realized that the RSPs couldn't really scale big enough to handle the theoretical throughput possible in the backplane, and that the Internet was growing so packet forwarding mattered. They figured out how to create a Versatile Interface Processor (VIP), which could accept two 7200 PAs in the VIP and allow people to move PAs between different platforms (easier spares, etc.). Later, they figured out how to put a true CPU on the VIP, and push a copy of the CEF FIB down to the VIP, which created "distributed switching". Now, things got really complicated: packets could arrive on an IP and leave on an IP, so the RSP handles the packets like it always has. BUT, if there's a VIP in slot 6, packets might arrive on a VIP and leave on an IP, so they can get forwarded by the VIP but have to ride the backplane over to the IP. Conversely, packets might arrive on an IP and leave on a VIP, so they have to go to the RSP for forwarding then ride the backplane onto the VIP. And of course, packets could arrive on a VIP and leave on a VIP, so the ingress VIP could do the forwarding and not really bother the RSP with the forwarding. HOWEVER, the forwarding tables had a habit of getting ugly at times, so they had to write the code such that you could disable distributed switching. Now all of a sudden, the VIPs turn into glorified IPs, and everything goes through the RSP. At this point, we have perhaps five different possibilities: non-distributed 100% on the RSP, or IP->IP, IP->VIP, VIP->IP, VIP->VIP. I'm sure you're going to say "but wait, if the customer pulls out all of the IPs and only puts in VIPs, they could just stay distributed and clean it all up!" - remember the IPs/VIPs are hot-swappable, so even if there aren't any IPs "right now", an IP could be inserted a second from now, and muck it all up.

    Then, the 12000 ("GSR" or Gigabit Switch Routers) came out, somewhere around 1999 or so when the Internet was out of control. Cisco designed it with a switching fabric that was set up in such a way that there's absolutely no way for the entire bandwidth of the box to travel into or out of the Gigabit Routing Processor (GRP). They gave every linecard a CPU, give the CPU a copy of the FIB, and basically declared that distributed switching was mandatory: if the FIB should crash, the linecard goes offline while it reboots (hopefully). Now, although there's complexity in managing a full set of distributed FIBs, the switching strategy is simplified: every packet is routed by the ingress card's CPU, period. Now things are easier, at least with respect to forwarding logic.

    Now, do you really want to overlay "queueing or not" on the fly? :) Hopefully, you said no: let's queue everything all the time, even if there's no congestion, and just drain the queues immediately if they're empty.

  • Thank you for the reply, I really appreciate it. Being young, it's hard to understand the way it was back in the day. It is really interesting to read things like this and hear about how things were.


    I am finding that QoS has weird gray areas, where mixed opinions and bad information are very abundant. I feel someone comfortable with QoS, but I will eventually get to the topic-specific book.


    The only interesting thing I have for you now, is that I have my doubts in regards to the 3650 platform. We have an input policy on access ports, but also an output policy as well. I disagree that the output policy on user ports is necessary, but my boss swears it is. I have looked far and wide, but cannot prove or disprove him.

  • The only interesting thing I have for you now, is that I have my doubts in regards to the 3650 platform. We have an input policy on access ports, but also an output policy as well. I disagree that the output policy on user ports is necessary, but my boss swears it is. I have looked far and wide, but cannot prove or disprove him.

    Imagine a PC connected at GE speeds and uplinks at XE speeds into a datacenter with lots of servers at XE speeds. It's completely feasible for the PC to download a file that could easily arrive at 10-20x the access port speed, thereby able to cause congestion so very easily. The output policy is necessary to ensure that VoIP packets still get priority, and business traffic still gets a share of the access port's bandwidth, even while the other huge but non-business critical file is downloading.

    Think of it this way: congestion can't occur on the inbound side of the port (well, in theory*). If the link is GE, and the sender on the other end of the cable is capable of transmitting data at truly 100% line rate, there's no way for it to transmit 101%, it just can't happen. So if the sender can never transmit more than 100%, the receiver (input side) can never get stuck with >100% traffic. However, if 17 servers all decide to send stuff to one particular PC (by hook or by crook), output congestion towards that PC is entirely possible.

    So in the end, the output policy makes absolute sense to me. The input policy could be necessary to ensure packets are marked appropriately for the overall QoS policy (business traffic marked this way but for only up to X% of the link, VoIP traffic marked that way but for only up to Y% of the link, etc.), so the input policy could easily be necessary while serving a different role than the output policy.

  • I would argue though that our 100+ branch sites all have T1s or 10 mbps MPLS circuits as their primary circuit, so there in that case there should never be a situation like the hypothetical 17 server one you produces. However, with that idea in mind, it does make sense why we would want the output policy on access ports at our HQ.

  • I would argue though that our 100+ branch sites all have T1s or 10 mbps MPLS circuits as their primary circuit, so there in that case there should never be a situation like the hypothetical 17 server one you produces. However, with that idea in mind, it does make sense why we would want the output policy on access ports at our HQ.

    I have two responses to you:

    1) Pragmatically, this is what "we" call not setting the requirements at the beginning of the project. What I'm saying is you never told me (at least in this post, and I'm not going to automatically review every post you've made just to answer this one) that you had branch offices on T1s or 10M MPLS. Don't drop some new requirements on me after I've solved your problem: be upfront and lay it all out there so those you're enlisting for help can get to it from the word go. It's the same with Project Managers, Architects, etc.: they need ALL of the requirements and assumptions up front.

    2) In the real world, do you REALLY want to support 100+ branch office switches WITHOUT an output policy on them AND perhaps 50 HQ switches WITH an output policy on them? I'll be the answer is NO, you want the output switching and queueing to be the same everywhere, so supportability is easier (same commands to troubleshoot both sets, same behavior on both sets if congestion were to happen, etc.). Remember, if your 3560s have two different configuration standards, that immediately becomes four config standards as soon as you start rolling out the Next Great Switch. If that deployment gets stalled, and then you move to a different model switch, you're looking at six config standards. Starting with one standard keeps it cleaner when you get forced into two or three because of platform changes.

  • Hey Peety,


    Just want to clarify that we do not actually have any real world problems in our network. I didn't know about the 17 server example until you explained it - and only listed the rest of the details afterwards naturally because I am starting to piece things together.


    Cheers amigo.

Sign In or Register to comment.