| rfc9957.original | rfc9957.txt | |||
|---|---|---|---|---|
| Network Working Group B. Briscoe, Ed. | Independent Submission B. Briscoe, Ed. | |||
| Internet-Draft Independent | Request for Comments: 9957 Independent | |||
| Intended status: Informational G. White | Category: Informational G. White | |||
| Expires: 26 May 2024 CableLabs | ISSN: 2070-1721 CableLabs | |||
| 23 November 2023 | April 2026 | |||
| The DOCSIS® Queue Protection Algorithm to Preserve Low Latency | The DOCSIS® Queue Protection Algorithm to Preserve Low Latency | |||
| draft-briscoe-docsis-q-protection-07 | ||||
| Abstract | Abstract | |||
| This informational document explains the specification of the queue | This Informational RFC explains the specification of the queue | |||
| protection algorithm used in DOCSIS technology since version 3.1. A | protection algorithm used in Data-Over-Cable Service Interface | |||
| shared low latency queue relies on the non-queue-building behaviour | Specification (DOCSIS) technology since version 3.1. A shared low- | |||
| of every traffic flow using it. However, some flows might not take | latency queue relies on the non-queue-building behavior of every | |||
| such care, either accidentally or maliciously. If a queue is about | traffic flow using it. However, some flows might not take such care, | |||
| to exceed a threshold level of delay, the queue protection algorithm | either accidentally or maliciously. If a queue is about to exceed a | |||
| can rapidly detect the flows most likely to be responsible. It can | threshold level of delay, the queue protection algorithm can rapidly | |||
| then prevent harm to other traffic in the low latency queue by | detect the flows most likely to be responsible. It can then prevent | |||
| ejecting selected packets (or all packets) of these flows. The | harm to other traffic in the low-latency queue by ejecting selected | |||
| document is designed for four types of audience: a) congestion | packets (or all packets) of these flows. This document is designed | |||
| control designers who need to understand how to keep on the 'good' | for four audiences: a) congestion control designers who need to | |||
| side of the algorithm; b) implementers of the algorithm who want to | understand how to keep on the "good" side of the algorithm; b) | |||
| understand it in more depth; c) designers of algorithms with similar | implementers of the algorithm who want to understand it in more | |||
| goals, perhaps for non-DOCSIS scenarios; and d) researchers | depth; c) designers of algorithms with similar goals, perhaps for | |||
| interested in evaluating the algorithm. | non-DOCSIS scenarios; and d) researchers interested in evaluating the | |||
| algorithm. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
| provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This is a contribution to the RFC Series, independently of any other | |||
| and may be updated, replaced, or obsoleted by other documents at any | RFC stream. The RFC Editor has chosen to publish this document at | |||
| time. It is inappropriate to use Internet-Drafts as reference | its discretion and makes no statement about its value for | |||
| material or to cite them other than as "work in progress." | implementation or deployment. Documents approved for publication by | |||
| the RFC Editor are not candidates for any level of Internet Standard; | ||||
| see Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 26 May 2024. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9957. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2023 IETF Trust and the persons identified as the | Copyright (c) 2026 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. | |||
| described in Section 4.e of the Trust Legal Provisions and are | ||||
| provided without warranty as described in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
| 1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Document Roadmap | |||
| 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Terminology | |||
| 1.3. Copyright Material . . . . . . . . . . . . . . . . . . . 5 | 1.3. Copyright Material | |||
| 2. Approach - In Brief . . . . . . . . . . . . . . . . . . . . . 5 | 2. Approach (In Brief) | |||
| 2.1. Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.1. Mechanism | |||
| 2.2. Policy . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 2.2. Policy | |||
| 2.2.1. Policy Conditions . . . . . . . . . . . . . . . . . . 7 | 2.2.1. Policy Conditions | |||
| 2.2.2. Policy Action . . . . . . . . . . . . . . . . . . . . 7 | 2.2.2. Policy Action | |||
| 3. Necessary Flow Behaviour . . . . . . . . . . . . . . . . . . 7 | 3. Necessary Flow Behavior | |||
| 4. Pseudocode Walk-Through . . . . . . . . . . . . . . . . . . . 8 | 4. Pseudocode Walk-Through | |||
| 4.1. Input Parameters, Constants and Variables . . . . . . . . 9 | 4.1. Input Parameters, Constants, and Variables | |||
| 4.2. Queue Protection Data Path . . . . . . . . . . . . . . . 12 | 4.2. Queue Protection Data Path | |||
| 4.2.1. The qprotect() function . . . . . . . . . . . . . . . 13 | 4.2.1. The qprotect() Function | |||
| 4.2.2. The pick_bucket() function . . . . . . . . . . . . . 14 | 4.2.2. The pick_bucket() Function | |||
| 4.2.3. The fill_bucket() function . . . . . . . . . . . . . 17 | 4.2.3. The fill_bucket() Function | |||
| 4.2.4. The calcProbNative() function . . . . . . . . . . . . 17 | 4.2.4. The calcProbNative() Function | |||
| 5. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | 5. Rationale | |||
| 5.1. Rationale: Blame for Queuing, not for Rate in Itself . . 18 | 5.1. Rationale: Blame for Queuing, Not for Rate in Itself | |||
| 5.2. Rationale for Constant Aging of the Queuing Score . . . . 20 | 5.2. Rationale for Constant Aging of the Queuing Score | |||
| 5.3. Rationale for Transformed Queuing Score . . . . . . . . . 21 | 5.3. Rationale for Transformed Queuing Score | |||
| 5.4. Rationale for Policy Conditions . . . . . . . . . . . . . 22 | 5.4. Rationale for Policy Conditions | |||
| 5.5. Rationale for Reclassification as the Policy Action . . . 25 | 5.5. Rationale for Reclassification as the Policy Action | |||
| 6. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 26 | 6. Limitations | |||
| 7. IANA Considerations (to be removed by RFC Editor) . . . . . . 26 | 7. IANA Considerations | |||
| 8. Implementation Status . . . . . . . . . . . . . . . . . . . . 26 | 8. Implementation Status | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 27 | 9. Security Considerations | |||
| 9.1. Resource Exhaustion Attacks . . . . . . . . . . . . . . . 28 | 9.1. Resource Exhaustion Attacks | |||
| 9.1.1. Exhausting Flow-State Storage . . . . . . . . . . . . 28 | 9.1.1. Exhausting Flow-State Storage | |||
| 9.1.2. Exhausting Processing Resources . . . . . . . . . . . 29 | 9.1.2. Exhausting Processing Resources | |||
| 10. Comments Solicited . . . . . . . . . . . . . . . . . . . . . 29 | 10. Comments Solicited | |||
| 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 29 | 11. References | |||
| 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 | 11.1. Normative References | |||
| 12.1. Normative References . . . . . . . . . . . . . . . . . . 30 | 11.2. Informative References | |||
| 12.2. Informative References . . . . . . . . . . . . . . . . . 30 | Acknowledgements | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 32 | Authors' Addresses | |||
| 1. Introduction | 1. Introduction | |||
| This informational document explains the specification of the queue | This Informational RFC explains the specification of the queue | |||
| protection (QProt) algorithm used in DOCSIS technology since version | protection (QProt) algorithm used in DOCSIS technology since version | |||
| 3.1 [DOCSIS]. | 3.1 [DOCSIS]. | |||
| Although the algorithm is defined in annex P of [DOCSIS], it relies | Although the algorithm is defined in Annex P of [DOCSIS], it relies | |||
| on cross-references to other parts of the set of specs. This | on cross references to other parts of the set of specifications. | |||
| document pulls all the strands together into one self-contained | This document pulls all the strands together into one self-contained | |||
| document. The core of the document is a similar pseudocode walk- | document. The core of the document is a similar pseudocode walk- | |||
| through to that in the DOCSIS spec, but it also includes additional | through to that in the DOCSIS specification, but it also includes | |||
| material: i) a brief overview; ii) a definition of how a data sender | additional material: | |||
| needs to behave to avoid triggering queue protection; and iii) a | ||||
| section giving the rationale for the design choices. | i. a brief overview, | |||
| ii. a definition of how a data sender needs to behave to avoid | ||||
| triggering queue protection, and | ||||
| iii. a section giving the rationale for the design choices. | ||||
| Low queuing delay depends on hosts sending their data smoothly, | Low queuing delay depends on hosts sending their data smoothly, | |||
| either at low rate or responding to explicit congestion notifications | either at a low rate or responding to Explicit Congestion | |||
| (ECN [RFC8311], [RFC9331]). So low queuing latency is something | Notifications (ECNs) (see [RFC8311] and [RFC9331]). So, low-latency | |||
| hosts create themselves, not something the network gives them. This | queuing is something hosts create themselves, not something the | |||
| tends to ensure that self-interest alone does not drive flows to mis- | network gives them. This tends to ensure that self-interest alone | |||
| mark their packets for the low latency queue. However, traffic from | does not drive flows to mis-mark their packets for the low-latency | |||
| an application that does not behave in a non-queue-building way might | queue. However, traffic from an application that does not behave in | |||
| erroneously be classified into a low latency queue, whether | a non-queue-building way might erroneously be classified into a low- | |||
| accidentally or maliciously. QProt protects other traffic in the low | latency queue, whether accidentally or maliciously. QProt protects | |||
| latency queue from the harm due to excess queuing that would | other traffic in the low-latency queue from the harm due to excess | |||
| otherwise be caused by such anomalous behaviour. | queuing that would otherwise be caused by such anomalous behavior. | |||
| In normal scenarios without misclassified traffic, QProt is not | In normal scenarios without misclassified traffic, QProt is not | |||
| expected to intervene at all in the classification or forwarding of | expected to intervene at all in the classification or forwarding of | |||
| packets. | packets. | |||
| An overview of how low latency support has been added to DOCSIS | An overview of how low-latency support has been added to DOCSIS | |||
| technology is given in [LLD]. In each direction of a DOCSIS link | technology is given in [LLD]. In each direction of a DOCSIS link | |||
| (upstream and downstream), there are two queues: one for Low Latency | (upstream and downstream), there are two queues: one for Low-Latency | |||
| (LL) and one for Classic traffic, in an arrangement similar to the | (LL) and one for Classic traffic, in an arrangement similar to the | |||
| IETF's Coupled DualQ AQM [RFC9332]. The two queues enable a | IETF's Coupled DualQ Active Queue Management (AQM) [RFC9332]. The | |||
| transition from 'Classic' to 'Scalable' congestion control so that | two queues enable a transition from "Classic" to "Scalable" | |||
| low latency can become the norm for any application, including ones | congestion control so that low latency can become the norm for any | |||
| seeking both full throughput and low latency, not just low-rate | application, including ones seeking both full throughput and low | |||
| applications that have been more traditionally associated with a low | latency, not just low-rate applications that have been more | |||
| latency service. The Classic queue is only necessary for traffic | traditionally associated with a low-latency service. The Classic | |||
| such as traditional (Reno/Cubic) TCP that needs about a round trip of | queue is only necessary for traffic such as traditional (Reno/Cubic) | |||
| buffering to fully utilize the link, and therefore has no incentive | TCP that needs about a round trip of buffering to fully utilize the | |||
| to mismark itself as low latency. The QProt function is located at | link, and therefore has no incentive to mismark itself as low | |||
| the ingress to the Low Latency queue. Therefore, in the upstream | latency. The QProt function is located at the ingress to the Low- | |||
| QProt is located on the cable modem (CM), and in the downstream it is | Latency queue. Therefore, in the upstream, QProt is located on the | |||
| located on the cable CMTS (CM Termination System). If an arriving | cable modem (CM); in the downstream, it is located on the CM | |||
| packet triggers queue protection, the QProt algorithm ejects the | Termination System (CMTS). If an arriving packet triggers queue | |||
| packet from the Low Latency queue and reclassifies it into the | protection, the QProt algorithm ejects the packet from the Low- | |||
| Classic queue. | Latency queue and reclassifies it into the Classic queue. | |||
| If QProt is used in settings other than DOCSIS links, it would be a | If QProt is used in settings other than DOCSIS links, it would be a | |||
| simple matter to detect queue-building flows by using slightly | simple matter to detect queue-building flows by using slightly | |||
| different conditions, and/or to trigger a different action as a | different conditions and/or to trigger a different action as a | |||
| consequence, as appropriate for the scenario, e.g., dropping instead | consequence, as appropriate for the scenario, e.g., dropping instead | |||
| of reclassifying packets or perhaps accumulating a second per-flow | of reclassifying packets or perhaps accumulating a second per-flow | |||
| score to decide whether to redirect a whole flow rather than just | score to decide whether to redirect a whole flow rather than just | |||
| certain packets. Such work is for future study and out of scope of | certain packets. Such work is for future study and out of scope of | |||
| the present document. | the present document. | |||
| The algorithm is based on a rigorous approach to quantifying how much | The QProt algorithm is based on a rigorous approach to quantifying | |||
| each flow contributes to congestion, which is used in economics to | how much each flow contributes to congestion, which is used in | |||
| allocate responsibility for the cost of one party's behaviour on | economics to allocate responsibility for the cost of one party's | |||
| others (the economic externality). Another important feature of the | behavior on others (the economic externality). Another important | |||
| approach is that the metric used for the queuing score is based on | feature of the approach is that the metric used for the queuing score | |||
| the same variable that determines the level of ECN signalling seen by | is based on the same variable that determines the level of ECN | |||
| the sender [RFC8311], [RFC9331]. This makes the internal queuing | signalling seen by the sender (see [RFC8311] and [RFC9331]). This | |||
| score visible externally as ECN markings. This transparency is | makes the internal queuing score visible externally as ECN markings. | |||
| necessary to be able to objectively state (in Section 3) how a flow | This transparency is necessary to be able to objectively state (in | |||
| can keep on the 'good' side of the algorithm. | Section 3) how a flow can keep on the "good" side of the algorithm. | |||
| 1.1. Document Roadmap | 1.1. Document Roadmap | |||
| The core of the document is the walk-through of the DOCSIS QProt | The core of the document is the walk-through of the DOCSIS QProt | |||
| algorithm's pseudocode in Section 4. | algorithm's pseudocode in Section 4. | |||
| Prior to that, Section 2 summarizes the approach used in the | Prior to that, Section 2 summarizes the approach used in the | |||
| algorithm, then Section 3 considers QProt from the perspective of the | algorithm. Then, Section 3 considers QProt from the perspective of | |||
| end-system, by defining the behaviour that a flow needs to comply | the end-system by defining the behavior that a flow needs to comply | |||
| with to avoid the QProt algorithm ejecting its packets from the low | with to avoid the QProt algorithm ejecting its packets from the low- | |||
| latency queue. | latency queue. | |||
| Section 5 gives deeper insight into the principles and rationale | Section 5 gives deeper insight into the principles and rationale | |||
| behind the algorithm. Then Section 6 explains the limitations of the | behind the algorithm. Then, Section 6 explains the limitations of | |||
| approach, followed by the usual closing sections. | the approach. The usual closing sections follow. | |||
| 1.2. Terminology | 1.2. Terminology | |||
| The normative language for the DOCSIS QProt algorithm is in the | The normative language for the DOCSIS QProt algorithm is in the | |||
| DOCSIS specs [DOCSIS], [DOCSIS-CM-OSS], [DOCSIS-CCAP-OSS] not in this | DOCSIS specifications [DOCSIS], [DOCSIS-CM-OSS], and | |||
| informational guide. If there is any inconsistency, the DOCSIS specs | [DOCSIS-CCAP-OSS]: not in this Informational RFC. If there is any | |||
| take precedence. | inconsistency, the DOCSIS specifications take precedence. | |||
| The following terms and abbreviations are used: | The following terms and abbreviations are used: | |||
| CM: Cable Modem | CM: Cable Modem | |||
| CMTS: CM Termination System | CMTS: CM Termination System | |||
| Congestion-rate: The transmission rate of bits or bytes contained | Congestion-rate: The transmission rate of bits or bytes contained | |||
| within packets of a flow that have the CE codepoint set in the IP- | within packets of a flow that have the CE codepoint set in the IP- | |||
| ECN field [RFC3168] (including IP headers unless specified | ECN field [RFC3168] (including IP headers unless specified | |||
| otherwise). Congestion-bit-rate and congestion-volume were | otherwise). Congestion-bit-rate and congestion-volume were | |||
| introduced in [RFC7713] and [RFC6789]. | introduced in [RFC7713] and [RFC6789]. | |||
| DOCSIS: Data Over Cable System Interface Specification. "DOCSIS" is | DOCSIS: Data-Over-Cable System Interface Specification. "DOCSIS" is | |||
| a registered trademark of Cable Television Laboratories, Inc. | a registered trademark of Cable Television Laboratories, Inc. | |||
| ("CableLabs"). | ("CableLabs"). | |||
| Non-queue-building: A flow that tends not to build a queue | Non-queue-building: A flow that tends not to build a queue. | |||
| Queue-building: A flow that builds a queue. If it is classified | Queue-building: A flow that builds a queue. If it is classified | |||
| into the Low Latency queue, it is therefore a candidate for the | into the Low-Latency queue, it is therefore a candidate for the | |||
| queue protection algorithm to detect and sanction. | queue protection algorithm to detect and sanction. | |||
| ECN: Explicit Congestion Notification | ECN: Explicit Congestion Notification | |||
| QProt: The Queue Protection function | QProt: The Queue Protection function | |||
| 1.3. Copyright Material | 1.3. Copyright Material | |||
| Parts of this document are reproduced from [DOCSIS] with kind | Parts of this document are reproduced from [DOCSIS] with kind | |||
| permission of the copyright holder, Cable Television Laboratories, | permission of the copyright holder, Cable Television Laboratories, | |||
| Inc. ("CableLabs"). | Inc. ("CableLabs"). | |||
| 2. Approach - In Brief | 2. Approach (In Brief) | |||
| The algorithm is divided into mechanism and policy. There is only a | The algorithm is divided into mechanism and policy. There is only a | |||
| tiny amount of policy code, but policy might need to be changed in | tiny amount of policy code, but policy might need to be changed in | |||
| the future. So, where hardware implementation is being considered, | the future. So, where hardware implementation is being considered, | |||
| it would be advisable to implement the policy aspects in firmware or | it would be advisable to implement the policy aspects in firmware or | |||
| software: | software: | |||
| * The mechanism aspects identify flows, maintain flow-state and | * The mechanism aspects identify flows, maintain flow-state, and | |||
| accumulate per-flow queuing scores; | accumulate per-flow queuing scores; | |||
| * The policy aspects can be divided into conditions and actions: | * The policy aspects can be divided into conditions and actions: | |||
| - The conditions are the logic that determines when action should | - The conditions are the logic that determines when action should | |||
| be taken to avert the risk of queuing delay becoming excessive; | be taken to avert the risk of queuing delay becoming excessive; | |||
| - The actions determine how this risk is averted, e.g., by | - The actions determine how this risk is averted, e.g., by | |||
| redirecting packets from a flow into another queue, or to | redirecting packets from a flow into another queue or by | |||
| reclassify a whole flow that seems to be misclassified. | reclassifying a whole flow that seems to be misclassified. | |||
| 2.1. Mechanism | 2.1. Mechanism | |||
| The algorithm maintains per-flow-state, where 'flow' usually means an | The algorithm maintains per-flow-state, where "flow" usually means an | |||
| end-to-end (layer-4) 5-tuple. The flow-state consists of a queuing | end-to-end (Layer 4) 5-tuple. The flow-state consists of a queuing | |||
| score that decays over time. Indeed it is transformed into time | score that decays over time. Indeed, it is transformed into time | |||
| units so that it represents the flow-state's own expiry time | units so that it represents the flow-state's own expiry time | |||
| (explained in Section 5.3). A higher queuing score pushes out the | (explained in Section 5.3). A higher queuing score pushes out the | |||
| expiry time further. | expiry time further. | |||
| Non-queue-building flows tend to release their flow-state rapidly --- | Non-queue-building flows tend to release their flow-state rapidly: it | |||
| it usually expires reasonably early in the gap between the packets of | usually expires reasonably early in the gap between the packets of a | |||
| a normal flow. Then the memory can be recycled for packets from | normal flow. Then, the memory can be recycled for packets from other | |||
| other flows that arrive in between. So only queue-building flows | flows that arrive in-between. So, only queue-building flows hold | |||
| hold flow state persistently. | flow state persistently. | |||
| The simplicity and effectiveness of the algorithm is due to the | The simplicity and effectiveness of the algorithm is due to the | |||
| definition of the queuing score. The queueing score represents the | definition of the queuing score. The queueing score represents the | |||
| share of blame for queuing that each flow bears. The scoring | share of blame for queuing that each flow bears. The scoring | |||
| algorithm uses the same internal variable, probNative, that the AQM | algorithm uses the same internal variable, probNative, that the AQM | |||
| for the low latency queue uses to ECN-mark packets (the other two | for the low-latency queue uses to ECN-mark packets. (The other two | |||
| forms of marking, Classic and coupled, are driven by Classic traffic | forms of marking, Classic and coupled, are driven by Classic traffic; | |||
| and therefore not relevant to protection of the LL queue). In this | therefore, they are not relevant to protection of the LL queue). In | |||
| way, the queuing score accumulates the size of each arriving packet | this way, the queuing score accumulates the size of each arriving | |||
| of a flow, but scaled by the value of probNative (in the range 0 to | packet of a flow but scaled by the value of probNative (in the range | |||
| 1) at the instant the packet arrives. So a flow's score accumulates | 0 to 1) at the instant the packet arrives. So, a flow's score | |||
| faster, the higher the degree of queuing and the faster that the | accumulates faster: | |||
| flow's packets arrive when there is queuing. Section 5.1 explains | ||||
| further why this score represents blame for queuing. | ||||
| The algorithm as described so far would accumulate a number that | * the higher the degree of queuing and | |||
| * the faster that the flow's packets arrive when there is queuing. | ||||
| Section 5.1 explains further why this score represents blame for | ||||
| queuing. | ||||
| The algorithm, as described so far, would accumulate a number that | ||||
| would rise at the so-called congestion-rate of the flow (see | would rise at the so-called congestion-rate of the flow (see | |||
| Terminology in Section 1.2), i.e., the rate at which the flow is | Section 1.2), i.e., the rate at which the flow is contributing to | |||
| contributing to congestion, or the rate at which the AQM is | congestion or the rate at which the AQM is forwarding bytes of the | |||
| forwarding bytes of the flow that are ECN marked. However, rather | flow that are ECN-marked. However, rather than growing continually, | |||
| than growing continually, the queuing score is also reduced (or | the queuing score is also reduced (or "aged") at a constant rate. | |||
| 'aged') at a constant rate. This is because it is unavoidable for | This is because it is unavoidable for capacity-seeking flows to | |||
| capacity-seeking flows to induce a continuous low level of congestion | induce a continuous low level of congestion as they track available | |||
| as they track available capacity. Section 5.2 explains why this | capacity. Section 5.2 explains why this allowance can be set to the | |||
| allowance can be set to the same constant for any scalable flow, | same constant for any scalable flow, whatever its bit rate. | |||
| whatever its bit rate. | ||||
| For implementation efficiency, the queuing score is transformed into | For implementation efficiency, the queuing score is transformed into | |||
| time units so that it represents the expiry time of the flow state | time units; this is so that it represents the expiry time of the flow | |||
| (as already discussed above). Then it does not need to be explicitly | state (as already discussed above). Then, it does not need to be | |||
| aged, because the natural passage of time implicitly 'ages' an expiry | explicitly aged because the natural passage of time implicitly "ages" | |||
| time. The transformation into time units simply involves dividing | an expiry time. The transformation into time units simply involves | |||
| the queuing score of each packet by the constant aging rate | dividing the queuing score of each packet by the constant aging rate | |||
| (explained further in Section 5.3). | (this is explained further in Section 5.3). | |||
| 2.2. Policy | 2.2. Policy | |||
| 2.2.1. Policy Conditions | 2.2.1. Policy Conditions | |||
| The algorithm uses the queuing score to determine whether to eject | The algorithm uses the queuing score to determine whether to eject | |||
| each packet only at the time it first arrives. This limits the | each packet only at the time it first arrives. This limits the | |||
| policies available. For instance, when queueing delay exceeds a | policies available. For instance, when queueing delay exceeds a | |||
| threshold, it is not possible to eject a packet from the flow with | threshold, it is not possible to eject a packet from the flow with | |||
| the highest queuing scoring, because that would involve searching the | the highest queuing scoring because that would involve searching the | |||
| queue for such a packet (if indeed one was still in the queue). | queue for such a packet (if, indeed, one were still in the queue). | |||
| Nonetheless, it is still possible to develop a policy that protects | Nonetheless, it is still possible to develop a policy that protects | |||
| the low latency of the queue by making the queuing score threshold | the low latency of the queue by making the queuing score threshold | |||
| stricter the greater the excess of queuing delay relative to the | stricter the greater the excess of queuing delay relative to the | |||
| threshold (explained in Section 5.4). | threshold (this is explained in Section 5.4). | |||
| 2.2.2. Policy Action | 2.2.2. Policy Action | |||
| In the DOCSIS QProt spec at the time of writing, when the policy | At the time of writing, the DOCSIS QProt specification states that | |||
| conditions are met the action taken to protect the low latency queue | when the policy conditions are met, the action taken to protect the | |||
| is to reclassify a packet into the Classic queue (justified in | low-latency queue is to reclassify a packet into the Classic queue | |||
| Section 5.5). | (this is justified in Section 5.5). | |||
| 3. Necessary Flow Behaviour | 3. Necessary Flow Behavior | |||
| The QProt algorithm described here can be used for responsive and/or | The QProt algorithm described here can be used for responsive and/or | |||
| unresponsive flows. | unresponsive flows. | |||
| * It is possible to objectively describe the least responsive way | * It is possible to objectively describe the least responsive way | |||
| that a flow will need to respond to congestion signals in order to | that a flow will need to respond to congestion signals in order to | |||
| avoid triggering queue protection, no matter the link capacity and | avoid triggering queue protection, no matter the link capacity and | |||
| no matter how much other traffic there is. | no matter how much other traffic there is. | |||
| * It is not possible to describe how fast or smooth an unresponsive | * It is not possible to describe how fast or smooth an unresponsive | |||
| flow should be to avoid queue protection, because this depends on | flow should be to avoid queue protection because this depends on | |||
| how much other traffic there is and the capacity of the link, | how much other traffic there is and the capacity of the link, | |||
| which an application is unable to know. However, the more | which an application is unable to know. However, the more | |||
| smoothly an unresponsive flow paces its packets and the lower its | smoothly an unresponsive flow paces its packets and the lower its | |||
| rate relative to typical broadband link capacities, the less | rate relative to typical broadband link capacities, the less | |||
| likelihood that it will risk causing enough queueing to trigger | likelihood that it will risk causing enough queueing to trigger | |||
| queue protection. | queue protection. | |||
| Responsive low latency flows can use an L4S ECN codepoint [RFC9331] | Responsive low-latency flows can use a Low Latency, Low Loss, and | |||
| to get classified into the low latency queue. | Scalable throughput (L4S) ECN codepoint [RFC9331] to get classified | |||
| into the low-latency queue. | ||||
| A sender can arrange for flows that are smooth but do not respond to | A sender can arrange for flows that are smooth but do not respond to | |||
| ECN marking to be classified into the low latency queue by using the | ECN marking to be classified into the low-latency queue by using the | |||
| Non-Queue-Building (NQB) Diffserv codepoint [I-D.ietf-tsvwg-nqb], | Non-Queue-Building (NQB) Diffserv codepoint [RFC9956], which the | |||
| which the DOCSIS specs support, or an operator can use various other | DOCSIS specifications support, or an operator can use various other | |||
| local classifiers. | local classifiers. | |||
| As already explained in Section 2.1, the QProt algorithm is driven | As already explained in Section 2.1, the QProt algorithm is driven | |||
| from the same variable that drives the ECN marking probability in the | from the same variable that drives the ECN-marking probability in the | |||
| low latency or 'LL' queue (the 'Native' AQM of the LL queue is | low-latency or "LL" queue (the "Native" AQM of the LL queue is | |||
| defined in the Immediate Active Queue Management Annex of [DOCSIS]). | defined in the Immediate Active Queue Management Annex of [DOCSIS]). | |||
| The algorithm that calculates this internal variable is run on the | The algorithm that calculates this internal variable is run on the | |||
| arrival of every packet, whether it is ECN-capable or not, so that it | arrival of every packet, whether or not it is ECN-capable, so that it | |||
| can be used by the QProt algorithm. But the variable is only used to | can be used by the QProt algorithm. But the variable is only used to | |||
| ECN-mark packets that are ECN-capable. | ECN-mark packets that are ECN-capable. | |||
| Not only does this dual use of the variable improve processing | Not only does this dual use of the variable improve processing | |||
| efficiency, but it also makes the basis of the QProt algorithm | efficiency, but it also makes the basis of the QProt algorithm | |||
| visible and transparent, at least for responsive ECN-capable flows. | visible and transparent, at least for responsive ECN-capable flows. | |||
| Then it is possible to state objectively that a flow can avoid | Then, it is possible to state objectively that a flow can avoid | |||
| triggering queue protection by keeping the bit rate of ECN marked | triggering queue protection by keeping the bit rate of ECN-marked | |||
| packets (the congestion-rate) below AGING, which is a configured | packets (the congestion-rate) below AGING, which is a configured | |||
| constant of the algorithm (default 2^19 B/s ~= 4 Mb/s). Note that it | constant of the algorithm (default 2^19 B/s ~= 4 Mb/s). Note that it | |||
| is in a congestion controller's own interest to keep its average | is in a congestion controller's own interest to keep its average | |||
| congestion-rate well below this level (e.g., ~1 Mb/s), to ensure that | congestion-rate well below this level (e.g., ~1 Mb/s) to ensure that | |||
| it does not trigger queue protection during transient dynamics. | it does not trigger queue protection during transient dynamics. | |||
| If the QProt algorithm is used in other settings, it would still need | If the QProt algorithm is used in other settings, it would still need | |||
| to be based on the visible level of congestion signalling, in a | to be based on the visible level of congestion signalling, in a | |||
| similar way to the DOCSIS approach. Without transparency of the | similar way to the DOCSIS approach. Without transparency of the | |||
| basis of the algorithm's decisions, end-systems would not be able to | basis of the algorithm's decisions, end-systems would not be able to | |||
| avoid triggering queue protection on an objective basis. | avoid triggering queue protection on an objective basis. | |||
| 4. Pseudocode Walk-Through | 4. Pseudocode Walk-Through | |||
| 4.1. Input Parameters, Constants and Variables | ||||
| 4.1. Input Parameters, Constants, and Variables | ||||
| The operator input parameters that set the parameters in the first | The operator input parameters that set the parameters in the first | |||
| two blocks of pseudocode below are defined for cable modems (CMs) in | two blocks of pseudocode below are defined for cable modems (CMs) in | |||
| [DOCSIS-CM-OSS] and for CMTSs in [DOCSIS-CCAP-OSS]. Then, further | [DOCSIS-CM-OSS] and for CMTSs in [DOCSIS-CCAP-OSS]. Then, further | |||
| constants are either derived from the input parameters or hard-coded. | constants are either derived from the input parameters or hard-coded. | |||
| Defaults and units are shown in square brackets. Defaults (or indeed | Defaults and units are shown in square brackets. Defaults (or indeed | |||
| any aspect of the algorithm) are subject to change, so the latest | any aspect of the algorithm) are subject to change, so the latest | |||
| DOCSIS specs are the definitive references. Also any operator might | DOCSIS specifications are the definitive references. Also, any | |||
| set certain parameters to non-default values. | operator might set certain parameters to non-default values. | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| // Input Parameters | // Input Parameters | |||
| MAX_RATE; // Configured maximum sustained rate [b/s] | MAX_RATE; // Configured maximum sustained rate [b/s] | |||
| QPROTECT_ON; // Queue Protection is enabled [Default: TRUE] | QPROTECT_ON; // Queue Protection is enabled [Default: TRUE] | |||
| CRITICALqL_us; // LL queue threshold delay [us] Default: MAXTH_us | CRITICALqL_us; // LL queue threshold delay [us] Default: MAXTH_us | |||
| CRITICALqLSCORE_us;// The threshold queuing score [Default: 4000us] | CRITICALqLSCORE_us;// The threshold queuing score [Default: 4000 µs] | |||
| LG_AGING; // The aging rate of the q'ing score [Default: 19] | LG_AGING; // The aging rate of the q'ing score [Default: 19] | |||
| // as log base 2 of the congestion-rate [lg(B/s)] | // as log base 2 of the congestion-rate [lg(B/s)] | |||
| // Input Parameters for the calcProbNative() algorithm: | // Input Parameters for the calcProbNative() algorithm: | |||
| MAXTH_us; // Max LL AQM marking threshold [Default: 1000us] | MAXTH_us; // Max LL AQM marking threshold [Default: 1000 µs] | |||
| LG_RANGE; // Log base 2 of the range of ramp [lg(ns)] | LG_RANGE; // Log base 2 of the range of ramp [lg(ns)] | |||
| // Default: 2^19 = 524288 ns (roughly 525 us) | // Default: 2^19 = 524288 ns (roughly 525 µs) | |||
| <CODE ENDS> | <CODE ENDS> | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| // Constants, either derived from input parameters or hard-coded | // Constants, either derived from input parameters or hard-coded | |||
| T_RES; // Resolution of t_exp [ns] | T_RES; // Resolution of t_exp [ns] | |||
| // Convert units (approx) | // Convert units (approx) | |||
| AGING = pow(2, (LG_AGING-30) ) * T_RES; // lg([B/s]) to [B/T_RES] | AGING = pow(2, (LG_AGING-30) ) * T_RES; // lg([B/s]) to [B/T_RES] | |||
| CRITICALqL = CRITICALqL_us * 1000; // [us] to [ns] | CRITICALqL = CRITICALqL_us * 1000; // [us] to [ns] | |||
| CRITICALqLSCORE = CRITICALqLSCORE_us * 1000/T_RES; // [us] to [T_RES] | CRITICALqLSCORE = CRITICALqLSCORE_us * 1000/T_RES; // [us] to [T_RES] | |||
| // Threshold for the q'ing score condition | // Threshold for the q'ing score condition | |||
| CRITICALqLPRODUCT = CRITICALqL * CRITICALqLSCORE; | CRITICALqLPRODUCT = CRITICALqL * CRITICALqLSCORE; | |||
| qLSCORE_MAX = 5E9 / T_RES; // Max queuing score = 5 s | qLSCORE_MAX = 5E9 / T_RES; // Max queuing score = 5 s | |||
| skipping to change at page 10, line 36 ¶ | skipping to change at line 437 ¶ | |||
| MAXTH = MAXTH_us * 1000; // Max marking threshold [ns] | MAXTH = MAXTH_us * 1000; // Max marking threshold [ns] | |||
| MAX_FRAME_SIZE = 2000; // DOCSIS-wide constant [B] | MAX_FRAME_SIZE = 2000; // DOCSIS-wide constant [B] | |||
| // Minimum marking threshold of 2 MTU for slow links [ns] | // Minimum marking threshold of 2 MTU for slow links [ns] | |||
| FLOOR = 2 * 8 * MAX_FRAME_SIZE * 10^9 / MAX_RATE; | FLOOR = 2 * 8 * MAX_FRAME_SIZE * 10^9 / MAX_RATE; | |||
| RANGE = (1 << LG_RANGE); // Range of ramp [ns] | RANGE = (1 << LG_RANGE); // Range of ramp [ns] | |||
| MINTH = max ( MAXTH - RANGE, FLOOR); | MINTH = max ( MAXTH - RANGE, FLOOR); | |||
| MAXTH = MINTH + RANGE; // Max marking threshold [ns] | MAXTH = MINTH + RANGE; // Max marking threshold [ns] | |||
| <CODE ENDS> | <CODE ENDS> | |||
| Throughout the pseudocode, most variables are integers. The only | Throughout the pseudocode, most variables are integers. The only | |||
| exceptions are floating point variables representing probabilities | exceptions are floating-point variables representing probabilities | |||
| (MAX_PROB and probNative) and the AGING parameter. The actual DOCSIS | (MAX_PROB and probNative) and the AGING parameter. The actual DOCSIS | |||
| QProt algorithm is defined using integer arithmetic, but in the | QProt algorithm is defined using integer arithmetic, but in the | |||
| floating point arithmetic used in this document, (0 <= probNative <= | floating-point arithmetic used in this document, (0 <= probNative <= | |||
| 1). Also, the pseudocode omits overflow checking and it would need | 1). Also, the pseudocode omits overflow checking, and it would need | |||
| to be made robust to non-default input parameters. | to be made robust to non-default input parameters. | |||
| The resolution for expressing time, T_RES, needs to be chosen to | The resolution for expressing time, T_RES, needs to be chosen to | |||
| ensure that expiry times for buckets can represent times that are a | ensure that expiry times for buckets can represent times that are a | |||
| fraction (e.g., 1/10) of the expected packet interarrival time for | fraction (e.g., 1/10) of the expected packet interarrival time for | |||
| the system. | the system. | |||
| The following definitions explain the purpose of important variables | The following definitions explain the purpose of important variables | |||
| and functions. | and functions. | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| // Public variables: | // Public variables: | |||
| qdelay; // The current queuing delay of the LL queue [ns] | qdelay; // The current queuing delay of the LL queue [ns] | |||
| probNative; // Native marking probability of LL queue within [0,1] | probNative; // Native marking probability of LL queue within [0,1] | |||
| // External variables | // External variables | |||
| packet; // The structure holding packet header fields | packet; // The structure holding packet header fields | |||
| packet.size; // The size of the current packet [B] | packet.size; // The size of the current packet [B] | |||
| packet.uflow; // The flow identifier of the current packet | packet.uflow; // The flow identifier of the current packet | |||
| // (e.g., 5-tuple or 4-tuple if IPSec) | // (e.g., 5-tuple or 4-tuple if IPsec) | |||
| // Irrelevant details of DOCSIS function to return qdelay are removed | // Irrelevant details of DOCSIS function to return qdelay are removed | |||
| qdelayL(...) // Returns current delay of the low latency Q [ns] | qdelayL(...) // Returns current delay of the low-latency Q [ns] | |||
| <CODE ENDS> | <CODE ENDS> | |||
| Pseudocode for how the algorithm categorizes packets by flow ID to | Pseudocode for how the algorithm categorizes packets by flow ID to | |||
| populate the variable packet.uflow is not given in detail here. The | populate the variable packet.uflow is not given in detail here. The | |||
| application's flow ID is usually defined by a common 5-tuple (or | application's flow ID is usually defined by a common 5-tuple (or | |||
| 4-tuple) of: | 4-tuple) of: | |||
| * source and destination IP addresses of the innermost IP header | * source and destination IP addresses of the innermost IP header | |||
| found; | found; | |||
| * the protocol (IPv4) or next header (IPv6) field in this IP header | * the protocol (IPv4) or next header (IPv6) field in this IP header | |||
| * either of: | * either of: | |||
| - source and destination port numbers, for TCP, UDP, UDP-Lite, | - source and destination port numbers, for TCP, UDP, UDP-Lite, | |||
| SCTP, DCCP, etc. | Stream Control Transmission Protocol (SCTP), Datagram | |||
| Congestion Control Protocol (DCCP), etc. | ||||
| - Security Parameters Index (SPI) for IPSec Encapsulating | - Security Parameter Index (SPI) for IPsec Encapsulating Security | |||
| Security Payload (ESP) [RFC4303]. | Payload (ESP) [RFC4303]. | |||
| The Microflow Classification section of the Queue Protection Annex of | The Microflow Classification section of the Queue Protection Annex of | |||
| the DOCSIS spec [DOCSIS] defines various strategies to find these | the DOCSIS specification [DOCSIS] defines various strategies to find | |||
| headers by skipping extension headers or encapsulations. If they | these headers by skipping extension headers or encapsulations. If | |||
| cannot be found, the spec defines various less-specific 3-tuples that | they cannot be found, the specification defines various less-specific | |||
| would be used. The DOCSIS spec should be referred to for all these | 3-tuples that would be used. The DOCSIS specification should be | |||
| strategies, which will not be repeated here. | referred to for all these strategies, which will not be repeated | |||
| here. | ||||
| The array of bucket structures defined below is used by all the Queue | The array of bucket structures defined below is used by all the Queue | |||
| Protection functions: | Protection functions: | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| struct bucket { // The leaky bucket structure to hold per-flow state | struct bucket { // The leaky bucket structure to hold per-flow state | |||
| id; // identifier (e.g., 5-tuple) of flow using bucket | id; // identifier (e.g., 5-tuple) of flow using bucket | |||
| t_exp; // expiry time in units of T_RES | t_exp; // expiry time in units of T_RES | |||
| // (t_exp - now) = flow's transformed q'ing score | // (t_exp - now) = flow's transformed q'ing score | |||
| }; | }; | |||
| skipping to change at page 12, line 22 ¶ | skipping to change at line 514 ¶ | |||
| <CODE ENDS> | <CODE ENDS> | |||
| 4.2. Queue Protection Data Path | 4.2. Queue Protection Data Path | |||
| All the functions of Queue Protection operate on the data path, | All the functions of Queue Protection operate on the data path, | |||
| driven by packet arrivals. | driven by packet arrivals. | |||
| The following functions that maintain per-flow queuing scores and | The following functions that maintain per-flow queuing scores and | |||
| manage per-flow state are considered primarily as mechanism: | manage per-flow state are considered primarily as mechanism: | |||
| pick_bucket(uflow_id); // Returns bucket identifier | * pick_bucket(uflow_id); // Returns bucket identifier | |||
| fill_bucket(bucket_id, pkt_size, probNative); // Returns queuing | * fill_bucket(bucket_id, pkt_size, probNative); // Returns queuing | |||
| score | score | |||
| calcProbNative(qdelay) // Returns ECN-marking probability of the | * calcProbNative(qdelay) // Returns ECN-marking probability of the | |||
| native LL AQM | native LL AQM | |||
| The following function is primarily concerned with policy: | The following function is primarily concerned with policy: | |||
| qprotect(packet, ...); // Returns exit status to either forward or | * qprotect(packet, ...); // Returns exit status to either forward or | |||
| redirect the packet | redirect the packet | |||
| ('...' suppresses distracting detail.) | ('...' suppresses distracting detail.) | |||
| Future modifications to policy aspects are more likely than to | Future modifications to policy aspects are more likely than | |||
| mechanisms. Therefore, policy aspects would be less appropriate | modifications to mechanisms. Therefore, policy aspects would be less | |||
| candidates for any hardware acceleration. | appropriate candidates for any hardware acceleration. | |||
| The entry point to these functions is qprotect(), which is called | The entry point to these functions is qprotect(), which is called | |||
| from packet classification before each packet is enqueued into the | from packet classification before each packet is enqueued into the | |||
| appropriate queue, queue_id, as follows: | appropriate queue, queue_id, as follows: | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| classifier(packet) { | classifier(packet) { | |||
| // Determine which queue using ECN, DSCP and any local-use fields | // Determine which queue using ECN, DSCP, and any local-use fields | |||
| queue_id = classify(packet); | queue_id = classify(packet); | |||
| // LQ & CQ are macros for valid queue IDs returned by classify() | // LQ & CQ are macros for valid queue IDs returned by classify() | |||
| if (queue_id == LQ) { | if (queue_id == LQ) { | |||
| // if packet classified to Low Latency Service Flow | // if packet classified to Low-Latency Service Flow | |||
| if (QPROTECT_ON) { | if (QPROTECT_ON) { | |||
| if (qprotect(packet, ...) == EXIT_SANCTION) { | if (qprotect(packet, ...) == EXIT_SANCTION) { | |||
| // redirect packet to Classic Service Flow | // redirect packet to Classic Service Flow | |||
| queue_id = CQ; | queue_id = CQ; | |||
| } | } | |||
| } | } | |||
| return queue_id; | return queue_id; | |||
| } | } | |||
| <CODE ENDS> | <CODE ENDS> | |||
| 4.2.1. The qprotect() function | 4.2.1. The qprotect() Function | |||
| On each packet arrival at the LL queue, qprotect() measures the | On each packet arrival at the LL queue, qprotect() measures the | |||
| current delay of the LL queue and derives the native LL marking | current delay of the LL queue and derives the native LL marking | |||
| probability from it. Then it uses pick_bucket to find the bucket | probability from it. Then, it uses pick_bucket to find the bucket | |||
| already holding the flow's state, or to allocate a new bucket if the | already holding the flow's state or to allocate a new bucket if the | |||
| flow is new or its state has expired (the most likely case). Then | flow is new or its state has expired (the most likely case). Then, | |||
| the queuing score is updated by the fill_bucket() function. That | the queuing score is updated by the fill_bucket() function. That | |||
| completes the mechanism aspects. | completes the mechanism aspects. | |||
| The comments against the subsequent policy conditions and actions | The comments against the subsequent policy conditions and actions | |||
| should be self-explanatory at a superficial level. The deeper | should be self-explanatory at a superficial level. The deeper | |||
| rationale for these conditions is given in Section 5.4. | rationale for these conditions is given in Section 5.4. | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| // Per packet queue protection | // Per-packet queue protection | |||
| qprotect(packet, ...) { | qprotect(packet, ...) { | |||
| bckt_id; // bucket index | bckt_id; // bucket index | |||
| qLscore; // queuing score of pkt's flow in units of T_RES | qLscore; // queuing score of pkt's flow in units of T_RES | |||
| qdelay = qL.qdelay(...); | qdelay = qL.qdelay(...); | |||
| probNative = calcProbNative(qdelay); | probNative = calcProbNative(qdelay); | |||
| bckt_id = pick_bucket(packet.uflow); | bckt_id = pick_bucket(packet.uflow); | |||
| qLscore = fill_bucket(buckets[bckt_id], packet.size, probNative); | qLscore = fill_bucket(buckets[bckt_id], packet.size, probNative); | |||
| skipping to change at page 14, line 33 ¶ | skipping to change at line 596 ¶ | |||
| // or qLSCORE_MAX reached | // or qLSCORE_MAX reached | |||
| || ( qLscore >= qLSCORE_MAX ) ) | || ( qLscore >= qLSCORE_MAX ) ) | |||
| return EXIT_SANCTION; | return EXIT_SANCTION; | |||
| else | else | |||
| return EXIT_SUCCESS; | return EXIT_SUCCESS; | |||
| } | } | |||
| <CODE ENDS> | <CODE ENDS> | |||
| 4.2.2. The pick_bucket() function | 4.2.2. The pick_bucket() Function | |||
| The pick_bucket() function is optimized for flow-state that will | The pick_bucket() function is optimized for flow-state that will | |||
| normally have expired from packet to packet of the same flow. It is | normally have expired from packet to packet of the same flow. It is | |||
| just one way of finding the bucket associated with the flow ID of | just one way of finding the bucket associated with the flow ID of | |||
| each packet - it might be possible to develop more efficient | each packet: it might be possible to develop more efficient | |||
| alternatives. | alternatives. | |||
| The algorithm is arranged so that the bucket holding any live (non- | The algorithm is arranged so that the bucket holding any live (non- | |||
| expired) flow-state associated with a packet will always be found | expired) flow-state associated with a packet will always be found | |||
| before a new bucket is allocated. The constant ATTEMPTS, defined | before a new bucket is allocated. The constant ATTEMPTS, defined | |||
| earlier, determines how many hashes are used to find a bucket for | earlier, determines how many hashes are used to find a bucket for | |||
| each flow (actually, only one hash is generated; then, by default, 5 | each flow. (Actually, only one hash is generated; then, by default, | |||
| bits of it at a time are used as the hash value, because by default | 5 bits of it at a time are used as the hash value because, by | |||
| there are 2^5 = 32 buckets). | default, there are 2^5 = 32 buckets). | |||
| The algorithm stores the flow's own ID in its flow-state. So, when a | The algorithm stores the flow's own ID in its flow-state. So, when a | |||
| packet of a flow arrives, the algorithm tries up to ATTEMPTS times to | packet of a flow arrives, the algorithm tries up to ATTEMPTS times to | |||
| hash to a bucket, looking for the flow's own ID. If found, it uses | hash to a bucket, looking for the flow's own ID. If found, it uses | |||
| that bucket, first resettings the expiry time to 'now' if it has | that bucket, first resetting the expiry time to "now" if it has | |||
| expired. | expired. | |||
| If it does not find the flow's ID, and the expiry time is still | If it does not find the flow's ID, and the expiry time is still | |||
| current, the algorithm can tell that another flow is using that | current, the algorithm can tell that another flow is using that | |||
| bucket, and it continues to look for a bucket for the flow. Even if | bucket, and it continues to look for a bucket for the flow. Even if | |||
| it finds another flow's bucket where the expiry time has passed, it | it finds another flow's bucket where the expiry time has passed, it | |||
| doesn't immediately use it. It merely remembers it as the potential | doesn't immediately use it. It merely remembers it as the potential | |||
| bucket to use. But first it runs through all the ATTEMPTS hashes to | bucket to use. But first it runs through all the ATTEMPTS hashes to | |||
| look for a bucket assigned to the flow ID. Then, if a live bucket is | look for a bucket assigned to the flow ID. Then, if a live bucket is | |||
| not already associated with the packet's flow, the algorithm should | not already associated with the packet's flow, the algorithm should | |||
| skipping to change at page 16, line 19 ¶ | skipping to change at line 655 ¶ | |||
| now; // current time | now; // current time | |||
| j; // loop counter | j; // loop counter | |||
| h32; // holds hash of the packet's flow IDs | h32; // holds hash of the packet's flow IDs | |||
| h; // bucket index being checked | h; // bucket index being checked | |||
| hsav; // interim chosen bucket index | hsav; // interim chosen bucket index | |||
| h32 = hash32(uflw); // 32-bit hash of flow ID | h32 = hash32(uflw); // 32-bit hash of flow ID | |||
| hsav = NBUCKETS; // Default bucket | hsav = NBUCKETS; // Default bucket | |||
| now = get_time_now(); // in units of T_RES | now = get_time_now(); // in units of T_RES | |||
| // The for loop checks ATTEMPTS buckets for ownership by flow-ID | // The for loop checks ATTEMPTS buckets for ownership by flow ID | |||
| // It also records the 1st bucket, if any, that could be recycled | // It also records the 1st bucket, if any, that could be recycled | |||
| // because it's expired. | // because it's expired. | |||
| // Must not recycle a bucket until all ownership checks completed | // Must not recycle a bucket until all ownership checks completed | |||
| for (j=0; j<ATTEMPTS; j++) { | for (j=0; j<ATTEMPTS; j++) { | |||
| // Use least signif. BI_SIZE bits of hash for each attempt | // Use least signif. BI_SIZE bits of hash for each attempt | |||
| h = h32 & MASK; | h = h32 & MASK; | |||
| if (buckets[h].id == uflw) { // Once uflw's bucket found... | if (buckets[h].id == uflw) { // Once uflw's bucket found... | |||
| if (buckets[h].t_exp <= now) // ...if bucket has expired... | if (buckets[h].t_exp <= now) // ...if bucket has expired... | |||
| buckets[h].t_exp = now; // ...reset it | buckets[h].t_exp = now; // ...reset it | |||
| return h; // Either way, use it | return h; // Either way, use it | |||
| } | } | |||
| else if ( (hsav == NBUCKETS) // If not seen expired bucket yet | else if ( (hsav == NBUCKETS) // If not seen expired bucket yet | |||
| // and this bucket has expired | // and this bucket has expired | |||
| && (buckets[h].t_exp <= now) ) { | && (buckets[h].t_exp <= now) ) { | |||
| hsav = h; // set it as the interim bucket | hsav = h; // set it as the interim bucket | |||
| } | } | |||
| h32 >>= BI_SIZE; // Bit-shift hash for next attempt | h32 >>= BI_SIZE; // Bit-shift hash for next attempt | |||
| } | } | |||
| // If reached here, no tested bucket was owned by the flow-ID | // If reached here, no tested bucket was owned by the flow ID | |||
| if (hsav != NBUCKETS) { | if (hsav != NBUCKETS) { | |||
| // If here, found an expired bucket within the above for loop | // If here, found an expired bucket within the above for loop | |||
| buckets[hsav].t_exp = now; // Reset expired bucket | buckets[hsav].t_exp = now; // Reset expired bucket | |||
| } else { | } else { | |||
| // If here, we're having to use the default bucket (the dregs) | // If here, we're having to use the default bucket (the dregs) | |||
| if (buckets[hsav].t_exp <= now) { // If dregs has expired... | if (buckets[hsav].t_exp <= now) { // If dregs has expired... | |||
| buckets[hsav].t_exp = now; // ...reset it | buckets[hsav].t_exp = now; // ...reset it | |||
| } | } | |||
| } | } | |||
| buckets[hsav].id = uflw; // In either case, claim for recycling | buckets[hsav].id = uflw; // In either case, claim for recycling | |||
| return hsav; | return hsav; | |||
| } | } | |||
| <CODE ENDS> | <CODE ENDS> | |||
| 4.2.3. The fill_bucket() function | 4.2.3. The fill_bucket() Function | |||
| The fill_bucket() function both accumulates and ages the queuing | The fill_bucket() function both accumulates and ages the queuing | |||
| score over time, as outlined in Section 2.1. To make aging the score | score over time, as outlined in Section 2.1. To make aging the score | |||
| efficient, the increment of the queuing score is transformed into | efficient, the increment of the queuing score is transformed into | |||
| units of time by dividing by AGING, so that the result represents the | units of time by dividing by AGING so that the result represents the | |||
| new expiry time of the flow. | new expiry time of the flow. | |||
| Given that probNative is already used to select which packets to ECN- | Given that probNative is already used to select which packets to ECN- | |||
| mark, it might be thought that the queuing score could just be | mark, it might be thought that the queuing score could just be | |||
| incremented by the full size of each selected packet, instead of | incremented by the full size of each selected packet, instead of | |||
| incrementing it by the product of every packet's size (pkt_sz) and | incrementing it by the product of every packet's size (pkt_sz) and | |||
| probNative. However, the unpublished experience of one of the | probNative. However, the unpublished experience of one of the | |||
| authors with other congestion policers has found that the score then | authors with other congestion policers has found that the score then | |||
| increments far too jumpily, particularly when probNative is low. | increments far too jumpily, particularly when probNative is low. | |||
| skipping to change at page 17, line 36 ¶ | skipping to change at line 720 ¶ | |||
| now = get_time_now(); // in units of T_RES | now = get_time_now(); // in units of T_RES | |||
| // Add packet's queuing score | // Add packet's queuing score | |||
| // For integer arithmetic, a bit-shift can replace the division | // For integer arithmetic, a bit-shift can replace the division | |||
| qLscore = min(buckets[bckt_id].t_exp - now | qLscore = min(buckets[bckt_id].t_exp - now | |||
| + probNative * pkt_sz / AGING, qLSCORE_MAX); | + probNative * pkt_sz / AGING, qLSCORE_MAX); | |||
| buckets[bckt_id].t_exp = now + qLscore; | buckets[bckt_id].t_exp = now + qLscore; | |||
| return qLscore; | return qLscore; | |||
| } | } | |||
| <CODE ENDS> | <CODE ENDS> | |||
| 4.2.4. The calcProbNative() function | 4.2.4. The calcProbNative() Function | |||
| To derive this queuing score, the QProt algorithm uses the linear | To derive this queuing score, the QProt algorithm uses the linear | |||
| ramp function calcProbNative() to normalize instantaneous queuing | ramp function calcProbNative() to normalize instantaneous queuing | |||
| delay of the LL queue into a probability in the range [0,1], which it | delay of the LL queue into a probability in the range [0,1], which it | |||
| assigns to probNative. | assigns to probNative. | |||
| <CODE BEGINS> | <CODE BEGINS> | |||
| calcProbNative(qdelay){ | calcProbNative(qdelay){ | |||
| if ( qdelay >= MAXTH ) { | if ( qdelay >= MAXTH ) { | |||
| probNative = MAX_PROB; | probNative = MAX_PROB; | |||
| skipping to change at page 18, line 21 ¶ | skipping to change at line 743 ¶ | |||
| // In practice, the * and the / would use a bit-shift | // In practice, the * and the / would use a bit-shift | |||
| } else { | } else { | |||
| probNative = 0; | probNative = 0; | |||
| } | } | |||
| return probNative; | return probNative; | |||
| } | } | |||
| <CODE ENDS> | <CODE ENDS> | |||
| 5. Rationale | 5. Rationale | |||
| 5.1. Rationale: Blame for Queuing, not for Rate in Itself | 5.1. Rationale: Blame for Queuing, Not for Rate in Itself | |||
| Figure 1 shows the bit rates of two flows as stacked areas. It poses | Figure 1 shows the bit rates of two flows as stacked areas. It poses | |||
| the question of which flow is more to blame for queuing delay; the | the question of which flow is more to blame for queuing delay: the | |||
| unresponsive constant bit rate flow (c) that is consuming about 80% | unresponsive constant bit rate flow (c) that is consuming about 80% | |||
| of the capacity, or the flow sending regular short unresponsive | of the capacity or the flow sending regular short unresponsive bursts | |||
| bursts (b)? The smoothness of c seems better for avoiding queuing, | (b)? The smoothness of c seems better for avoiding queuing, but its | |||
| but its high rate does not. However, if flow c was not there, or ran | high rate does not. However, if flow c were not there, or ran | |||
| slightly more slowly, b would not cause any queuing. | slightly more slowly, b would not cause any queuing. | |||
| ^ bit rate (stacked areas) | ^ bit rate (stacked areas) | |||
| | ,-. ,-. ,-. ,-. ,-. | | ,-. ,-. ,-. ,-. ,-. | |||
| |--|b|----------|b|----------|b|----------|b|----------|b|---Capacity | |--|b|----------|b|----------|b|----------|b|----------|b|---Capacity | |||
| |__|_|__________|_|__________|_|__________|_|__________|_|_____ | |__|_|__________|_|__________|_|__________|_|__________|_|_____ | |||
| | | | | |||
| | c | | c | |||
| | | | | |||
| | | | | |||
| | | | | |||
| +----------------------------------------------------------------> | +----------------------------------------------------------------> | |||
| time | time | |||
| Figure 1: Which is More to Blame for Queuing Delay? | Figure 1: Which is more to blame for queuing delay? | |||
| To explain queuing scores, in the following it will initially be | To explain queuing scores, in the following it will initially be | |||
| assumed that the QProt algorithm is accumulating queuing scores, but | assumed that the QProt algorithm is accumulating queuing scores but | |||
| not taking any action as a result. | not taking any action as a result. | |||
| To quantify the responsibility that each flow bears for queuing | To quantify the responsibility that each flow bears for queuing | |||
| delay, the QProt algorithm accumulates the product of the rate of | delay, the QProt algorithm accumulates the product of the rate of | |||
| each flow and the level of congestion, both measured at the instant | each flow and the level of congestion, both measured at the instant | |||
| each packet arrives. The instantaneous flow rate is represented at | each packet arrives. The instantaneous flow rate is represented at | |||
| each discrete event when a packet arrives by the packet's size, which | each discrete event when a packet arrives by the packet's size, which | |||
| accumulates faster the more packets arrive within each unit of time. | accumulates faster the more packets arrive within each unit of time. | |||
| The level of congestion is normalized to a dimensionless number | The level of congestion is normalized to a dimensionless number | |||
| between 0 and 1 (probNative). This fractional congestion level is | between 0 and 1 (probNative). This fractional congestion level is | |||
| skipping to change at page 19, line 22 ¶ | skipping to change at line 792 ¶ | |||
| * to be able to ignore very low levels of queuing that contribute | * to be able to ignore very low levels of queuing that contribute | |||
| insignificantly to delay | insignificantly to delay | |||
| * to be able to erect a steep barrier against excessive queuing | * to be able to erect a steep barrier against excessive queuing | |||
| delay | delay | |||
| The unit of the resulting queue score is "congested-bytes" per | The unit of the resulting queue score is "congested-bytes" per | |||
| second, which distinguishes it from just bytes per second. | second, which distinguishes it from just bytes per second. | |||
| Then, during the periods between bursts (b), neither flow accumulates | Then, during the periods between bursts (b), neither flow accumulates | |||
| any queuing score - the high rate of c is benign. But, during each | any queuing score: the high rate of c is benign. But, during each | |||
| burst, if we say the rate of c and b are 80% and 45% of capacity, | burst, if we say the rate of c and b are 80% and 45% of capacity, | |||
| thus causing 25% overload, they each bear (80/125)% and (45/125)% of | thus causing 25% overload, they each bear (80/125)% and (45/125)% of | |||
| the responsibility for the queuing delay (64% and 36%). The | the responsibility for the queuing delay (64% and 36%). The | |||
| algorithm does not explicitly calculate these percentages. They are | algorithm does not explicitly calculate these percentages. They are | |||
| just the outcome of the number of packets arriving from each flow | just the outcome of the number of packets arriving from each flow | |||
| during the burst. | during the burst. | |||
| To summarize, the queuing score never sanctions rate solely on its | To summarize, the queuing score never sanctions rate solely on its | |||
| own account. It only sanctions rate inasmuch as it causes queuing. | own account. It only sanctions rate inasmuch as it causes queuing. | |||
| skipping to change at page 19, line 45 ¶ | skipping to change at line 815 ¶ | |||
| |------Capacity-|b|----------,-.----------|b|----------|b\----- | |------Capacity-|b|----------,-.----------|b|----------|b\----- | |||
| | __|_|_______ |b| /``\| _...-._-': | ,.-- | | __|_|_______ |b| /``\| _...-._-': | ,.-- | |||
| | ,-. __/ \__|_|_ _/ |/ \|/ | | ,-. __/ \__|_|_ _/ |/ \|/ | |||
| | |b| ___/ \___/ __ r | | |b| ___/ \___/ __ r | |||
| | |_|/ v \__/ \_______ _/\____/ | | |_|/ v \__/ \_______ _/\____/ | |||
| | _/ \__/ | | _/ \__/ | |||
| | | | | |||
| +----------------------------------------------------------------> | +----------------------------------------------------------------> | |||
| time | time | |||
| Figure 2: Responsibility for Queuing: More Complex Scenario | Figure 2: Responsibility for Queuing: More-Complex Scenario | |||
| Figure 2 gives a more complex illustration of the way the queuing | Figure 2 gives a more-complex illustration of the way the queuing | |||
| score assigns responsibility for queuing (limited to the precision | score assigns responsibility for queuing (limited to the precision | |||
| that ASCII art can illustrate). The figure shows the bit rates of | that ASCII art can illustrate). The figure shows the bit rates of | |||
| three flows represented as stacked areas labelled b, v and r. The | three flows represented as stacked areas labelled b, v, and r. The | |||
| unresponsive bursts (b) are the same as in the previous example, but | unresponsive bursts (b) are the same as in the previous example, but | |||
| a variable rate video (v) replaces flow c. It's rate varies as the | a variable-rate video (v) replaces flow c. Its rate varies as the | |||
| complexity of the video scene varies. Also on a slower timescale, in | complexity of the video scene varies. Also, on a slower timescale, | |||
| response to the level of congestion, the video adapts its quality. | in response to the level of congestion, the video adapts its quality. | |||
| However, on a short time-scale it appears to be unresponsive to small | However, on a short timescale it appears to be unresponsive to small | |||
| amounts of queuing. Also, part-way through, a low latency responsive | amounts of queuing. Also, partway through, a low-latency responsive | |||
| flow (r) joins in, aiming to fill the balance of capacity left by the | flow (r) joins in, aiming to fill the balance of capacity left by the | |||
| other two. | other two. | |||
| The combination of the first burst and the low application-limited | The combination of the first burst and the low application-limited | |||
| rate of the video causes neither flow to accumulate queuing score. | rate of the video causes neither flow to accumulate queuing score. | |||
| In contrast, the second burst causes similar excessive overload | In contrast, the second burst causes similar excessive overload | |||
| (125%) to the example in Figure 1. Then, the video happens to reduce | (125%) to the example in Figure 1. Then, the video happens to reduce | |||
| its rate (probably due to a less complex scene) so the third burst | its rate (probably due to a less-complex scene) so the third burst | |||
| causes only a little congestion. Let us assume the resulting queue | causes only a little congestion. Let us assume the resulting queue | |||
| causes probNative to rise to just 1%, then the queuing score will | causes probNative to rise to just 1%, then the queuing score will | |||
| only accumulate 1% of the size of each packet of flows v and b during | only accumulate 1% of the size of each packet of flows v and b during | |||
| this burst. | this burst. | |||
| The fourth burst happens to arrive just as the new responsive flow | The fourth burst happens to arrive just as the new responsive flow | |||
| (r) has filled the available capacity, so it leads to very rapid | (r) has filled the available capacity, so it leads to very rapid | |||
| growth of the queue. After a round trip the responsive flow rapidly | growth of the queue. After a round trip, the responsive flow rapidly | |||
| backs off, and the adaptive video also backs off more rapidly than it | backs off, and the adaptive video also backs off more rapidly than it | |||
| would normally, because of the very high congestion level. The rapid | would normally because of the very high congestion level. The rapid | |||
| response to congestion of flow r reduces the queuing score that all | response to congestion of flow r reduces the queuing score that all | |||
| three flows accumulate, but they each still bear the cost in | three flows accumulate, but they each still bear the cost in | |||
| proportion to the product of the rates at which their packets arrive | proportion to the product of the rates at which their packets arrive | |||
| at the queue and the value of probNative when they do so. Thus, | at the queue and the value of probNative when they do so. Thus, | |||
| during the fifth burst, they all accumulate less score than the | during the fifth burst, they all accumulate a lower score than the | |||
| fourth, because the queuing delay is not as excessive. | fourth because the queuing delay is not as excessive. | |||
| 5.2. Rationale for Constant Aging of the Queuing Score | 5.2. Rationale for Constant Aging of the Queuing Score | |||
| Even well-behaved flows will not always be able to respond fast | Even well-behaved flows will not always be able to respond fast | |||
| enough to dynamic events. Also well-behaved flows, e.g., DCTCP | enough to dynamic events. Also, well-behaved flows, e.g., Data | |||
| [RFC8257], TCP Prague [I-D.briscoe-iccrg-prague-congestion-control], | Center TCP (DCTCP) [RFC8257], TCP Prague [PRAGUE-CC], Bottleneck | |||
| BBRv3 [BBRv3] or the L4S variant of SCReAM [SCReAM] for real-time | Bandwidth and Round-trip propagation time version 3 (BBRv3) [BBRv3], | |||
| media [RFC8298], can maintain a very shallow queue by continual | or the L4S variant of SCReAM [SCReAM] for real-time media [RFC8298], | |||
| careful probing for more while also continually subtracting a little | can maintain a very shallow queue by continual careful probing for | |||
| from their rate (or congestion window) in response to low levels of | more while also continually subtracting a little from their rate (or | |||
| ECN signalling. Therefore, the QProt algorithm needs to continually | congestion window) in response to low levels of ECN signalling. | |||
| offer a degree of forgiveness to age out the queuing score as it | Therefore, the QProt algorithm needs to continually offer a degree of | |||
| accumulates. | forgiveness to age out the queuing score as it accumulates. | |||
| Scalable congestion controllers such as those above maintain their | Scalable congestion controllers, such as those above, maintain their | |||
| congestion window in inverse proportion to the congestion level, | congestion window in inverse proportion to the congestion level, | |||
| probNative. That leads to the important property that on average a | probNative. That leads to the important property that, on average, a | |||
| scalable flow holds the product of its congestion window and the | scalable flow holds the product of its congestion window and the | |||
| congestion level constant, no matter the capacity of the link or how | congestion level constant, no matter the capacity of the link or how | |||
| many other flows it competes with. For instance, if the link | many other flows it competes with. For instance, if the link | |||
| capacity doubles, a scalable flow induces half the congestion | capacity doubles, a scalable flow induces half the congestion | |||
| probability. Or if three scalable flows compete for the capacity, | probability. Or, if three scalable flows compete for the capacity, | |||
| each flow will reduce to one third of the capacity they would use on | each flow will reduce to one third of the capacity they would use on | |||
| their own and increase the congestion level by 3x. Therefore, in | their own and increase the congestion level by 3x. Therefore, in | |||
| steady state, a scalable flow will induce the same constant amount of | steady state, a scalable flow will induce the same constant amount of | |||
| "congested-bytes" per round trip, whatever the link capacity, and no | "congested-bytes" per round trip, whatever the link capacity and no | |||
| matter how many flows are sharing the capacity. | matter how many flows are sharing the capacity. | |||
| This suggests that the QProt algorithm will not sanction a well- | This suggests that the QProt algorithm will not sanction a well- | |||
| behaved scalable flow if it ages out the queuing score at a | behaved scalable flow if it ages out the queuing score at a | |||
| sufficient constant rate. The constant will need to be somewhat | sufficient constant rate. The constant will need to be somewhat | |||
| above the average of a well-behaved scalable flow to allow for normal | above the average of a well-behaved scalable flow to allow for normal | |||
| dynamics. | dynamics. | |||
| Relating QProt's aging constant to a scalable flow does not mean that | Relating QProt's aging constant to a scalable flow does not mean that | |||
| a flow has to behave like a scalable flow. It can be less | a flow has to behave like a scalable flow: it can be less aggressive | |||
| aggressive, but not more. For instance, a longer RTT flow can run at | but not more aggressive. For instance, a longer RTT flow can run at | |||
| a lower congestion-rate than the aging rate, but it can also increase | a lower congestion-rate than the aging rate, but it can also increase | |||
| its aggressiveness to equal the rate of short RTT scalable flows | its aggressiveness to equal the rate of short RTT scalable flows | |||
| [ScalingCC]. The constant aging of QProt also means that a long- | [ScalingCC]. The constant aging of QProt also means that a long- | |||
| running unresponsive flow will be prone to trigger QProt if it runs | running unresponsive flow will be prone to trigger QProt if it runs | |||
| faster than a competing responsive scalable flow would. And, of | faster than a competing responsive scalable flow would. And, of | |||
| course, if a flow causes excessive queuing in the short-term, its | course, if a flow causes excessive queuing in the short term, its | |||
| queuing score will still rise faster than the constant aging process | queuing score will still rise faster than the constant aging process | |||
| will decrease it. Then QProt will still eject the flow's packets | will decrease it. Then, QProt will still eject the flow's packets | |||
| before they harm the low latency of the shared queue. | before they harm the low latency of the shared queue. | |||
| 5.3. Rationale for Transformed Queuing Score | 5.3. Rationale for Transformed Queuing Score | |||
| The QProt algorithm holds a flow's queuing score state in a structure | The QProt algorithm holds a flow's queuing score state in a structure | |||
| called a bucket, because of its similarity to a classic leaky bucket | called a "bucket". This is because of its similarity to a classic | |||
| (except the contents of the bucket does not represent bytes). | leaky bucket (except the contents of the bucket do not represent | |||
| bytes). | ||||
| probNative * pkt_sz probNative * pkt_sz / AGING | probNative * pkt_sz probNative * pkt_sz / AGING | |||
| | | | | | | |||
| | V | | V | | | V | | V | | |||
| | : | ___ | : | | | : | ___ | : | | |||
| |_____| ___ |_____| | |_____| ___ |_____| | |||
| | | ___ | | | | | ___ | | | |||
| |__ __| |__ __| | |__ __| |__ __| | |||
| | | | | | | |||
| V V | V V | |||
| skipping to change at page 22, line 17 ¶ | skipping to change at line 930 ¶ | |||
| times when the scores of the current and previous packets were | times when the scores of the current and previous packets were | |||
| processed. | processed. | |||
| A transformed equivalent of this token bucket is shown on the right | A transformed equivalent of this token bucket is shown on the right | |||
| of Figure 3, dividing both the input and output by the constant AGING | of Figure 3, dividing both the input and output by the constant AGING | |||
| rate. The result is a bucket-depth that represents time and it | rate. The result is a bucket-depth that represents time and it | |||
| drains at the rate that time passes. | drains at the rate that time passes. | |||
| As a further optimization, the time the bucket was last updated is | As a further optimization, the time the bucket was last updated is | |||
| not stored with the flow-state. Instead, when the bucket is | not stored with the flow-state. Instead, when the bucket is | |||
| initialized the queuing score is added to the system time 'now' and | initialized, the queuing score is added to the system time "now" and | |||
| the resulting expiry time is written into the bucket. Subsequently, | the resulting expiry time is written into the bucket. Subsequently, | |||
| if the bucket has not expired, the incremental queuing score is added | if the bucket has not expired, the incremental queuing score is added | |||
| to the time already held in the bucket. Then the queuing score | to the time already held in the bucket. Then, the queuing score | |||
| always represents the expiry time of the flow-state itself. This | always represents the expiry time of the flow-state itself. This | |||
| means that the queuing score does not need to be aged explicitly | means that the queuing score does not need to be aged explicitly | |||
| because it ages itself implicitly. | because it ages itself implicitly. | |||
| 5.4. Rationale for Policy Conditions | 5.4. Rationale for Policy Conditions | |||
| Pseudocode for the QProt policy conditions is given in Section 4.1 | Pseudocode for the QProt policy conditions is given in Section 4.1 | |||
| within the second half of the qprotect() function. When each packet | within the second half of the qprotect() function. When each packet | |||
| arrives, after finding its flow state and updating the queuing score | arrives, after finding its flow state and updating the queuing score | |||
| of the packet's flow, the algorithm checks whether the shared queue | of the packet's flow, the algorithm checks whether the shared queue | |||
| skipping to change at page 23, line 5 ¶ | skipping to change at line 966 ¶ | |||
| temporarily scaled up by the ratio of the current queue delay to the | temporarily scaled up by the ratio of the current queue delay to the | |||
| threshold queuing delay, CRITICALqL (the reason for the scaling is | threshold queuing delay, CRITICALqL (the reason for the scaling is | |||
| given next). If this scaled up score exceeds another constant | given next). If this scaled up score exceeds another constant | |||
| threshold CRITICALqLSCORE, the packet is ejected. The actual last | threshold CRITICALqLSCORE, the packet is ejected. The actual last | |||
| line of code above multiplies both sides of the second condition by | line of code above multiplies both sides of the second condition by | |||
| CRITICALqL to avoid a costly division. | CRITICALqL to avoid a costly division. | |||
| This approach allows each packet to be assessed once, as it arrives. | This approach allows each packet to be assessed once, as it arrives. | |||
| Once queue delay exceeds the threshold, it has two implications: | Once queue delay exceeds the threshold, it has two implications: | |||
| * The current packet might be ejected even though there are packets | * The current packet might be ejected, even though there are packets | |||
| already in the queue from flows with higher queuing scores. | already in the queue from flows with higher queuing scores. | |||
| However, any flow that continues to contribute to the queue will | However, any flow that continues to contribute to the queue will | |||
| have to send further packets, giving an opportunity to eject them | have to send further packets, giving an opportunity to eject them | |||
| as well, as they subsequently arrive. | as well, as they subsequently arrive. | |||
| * The next packets to arrive might not be ejected, because they | * The next packets to arrive might not be ejected because they might | |||
| might belong to flows with low queuing scores. In this case, | belong to flows with low queuing scores. In this case, queue | |||
| queue delay could continue to rise with no opportunity to eject a | delay could continue to rise with no opportunity to eject a | |||
| packet. This is why the queuing score is scaled up by the current | packet. This is why the queuing score is scaled up by the current | |||
| queue delay. Then, the more the queue has grown without ejecting | queue delay. Then, the more the queue has grown without ejecting | |||
| a packet, the more the algorithm 'raises the bar' to further | a packet, the more the algorithm "raises the bar" to further | |||
| packets. | packets. | |||
| The above approach is preferred over the extra per-packet processing | The above approach is preferred over the extra per-packet processing | |||
| cost of searching the buckets for the flow with the highest queuing | cost of searching the buckets for the flow with the highest queuing | |||
| score and searching the queue for one of its packets to eject (if one | score and searching the queue for one of its packets to eject (if one | |||
| is still in the queue). | is still in the queue). | |||
| Note that by default CRITICALqL_us is set to the maximum threshold of | Note that, by default, CRITICALqL_us is set to the maximum threshold | |||
| the ramp marking algorithm, MAXTH_us. However, there is some debate | of the ramp marking algorithm MAXTH_us. However, there is some | |||
| as to whether setting it to the minimum threshold instead would | debate as to whether setting it to the minimum threshold instead | |||
| improve QProt performance. This would roughly double the ratio of | would improve QProt performance. This would roughly double the ratio | |||
| qdelay to CRITICALqL, which is compared against the CRITICALqLSCORE | of qdelay to CRITICALqL, which is compared against the | |||
| threshold. So the threshold would have to be roughly doubled | CRITICALqLSCORE threshold. So, the threshold would have to be | |||
| accordingly. | roughly doubled accordingly. | |||
| Figure 4 explains this approach graphically. On the horizontal axis | Figure 4 explains this approach graphically. On the horizontal axis, | |||
| it shows actual harm, meaning the queuing delay in the shared queue. | it shows actual harm, meaning the queuing delay in the shared queue. | |||
| On the vertical axis it shows the behaviour record of the flow | On the vertical axis, it shows the behavior record of the flow | |||
| associated with the currently arriving packet, represented in the | associated with the currently arriving packet, represented in the | |||
| algorithm by the flow's queuing score. The shaded region represents | algorithm by the flow's queuing score. The shaded region represents | |||
| the combination of actual harm and behaviour record that will lead to | the combination of actual harm and behavior record that will lead to | |||
| the packet being ejected. | the packet being ejected. | |||
| Behaviour Record: | Behavior Record: | |||
| Queueing Score of | Queueing Score of | |||
| Arriving Packet's Flow | Arriving Packet's Flow | |||
| ^ | ^ | |||
| | + |/ / / / / / / / / / / / / / / / / / / | | + |/ / / / / / / / / / / / / / / / / / / | |||
| | + N | / / / / / / / / / / / / / / / / / / / | | + N | / / / / / / / / / / / / / / / / / / / | |||
| | + |/ / / / / / / / / / | | + |/ / / / / / / / / / | |||
| | + | / / / / E (Eject packet) / / / / / | | + | / / / / E (Eject packet) / / / / / | |||
| | + |/ / / / / / / / / / | | + |/ / / / / / / / / / | |||
| | + | / / / / / / / / / / / / / / / / / / / | | + | / / / / / / / / / / / / / / / / / / / | |||
| | + |/ / / / / / / / / / / / / / / / / / / | | + |/ / / / / / / / / / / / / / / / / / / | |||
| skipping to change at page 24, line 28 ¶ | skipping to change at line 1024 ¶ | |||
| | N | + / / / / / / / / / / / / / / / / / | | N | + / / / / / / / / / / / / / / / / / | |||
| | (No actual | +/ / / / / / / / / / / / / / / | | (No actual | +/ / / / / / / / / / / / / / / | |||
| | harm) | + / / / / / / / / / / / / | | harm) | + / / / / / / / / / / / / | |||
| | | P (Pass over) + ,/ / / / / / / / | | | P (Pass over) + ,/ / / / / / / / | |||
| | | ^ + /./ /_/ | | | ^ + /./ /_/ | |||
| +--------------+------------------------------------------> | +--------------+------------------------------------------> | |||
| CRITICALqL Actual Harm: Shared Queue Delay | CRITICALqL Actual Harm: Shared Queue Delay | |||
| Figure 4: Graphical Explanation of the Policy Conditions | Figure 4: Graphical Explanation of the Policy Conditions | |||
| The regions labelled 'N' represent cases where the first condition is | The regions labelled "N" represent cases where the first condition is | |||
| not met - no actual harm - queue delay is below the critical | not met -- no actual harm -- queue delay is below the critical | |||
| threshold, CRITICALqL. | threshold, CRITICALqL. | |||
| The region labelled 'E' represents cases where there is actual harm | The region labelled "E" represents cases where there is actual harm | |||
| (queue delay exceeds CRITICALqL) and the queuing score associated | (queue delay exceeds CRITICALqL) and the queuing score associated | |||
| with the arriving packet is high enough to be able to eject it with | with the arriving packet is high enough to be able to eject it with | |||
| certainty. | certainty. | |||
| The region labelled 'P' represents cases where there is actual harm, | The region labelled "P" represents cases where there is actual harm, | |||
| but the queuing score of the arriving packet is insufficient to eject | but the queuing score of the arriving packet is insufficient to eject | |||
| it, so it has to be Passed over. This adds to queuing delay, but the | it, so it has to be passed over. This adds to queuing delay, but the | |||
| alternative would be to sanction an innocent flow. It can be seen | alternative would be to sanction an innocent flow. It can be seen | |||
| that, as actual harm increases, the judgement of innocence becomes | that, as actual harm increases, the judgment of innocence becomes | |||
| increasingly stringent; the behaviour record of the next packet's | increasingly stringent; the behavior record of the next packet's flow | |||
| flow does not have to be as bad to eject it. | does not have to be as bad to eject it. | |||
| Conditioning ejection on actual harm helps prevent VPN packets being | Conditioning ejection on actual harm helps prevent VPN packets being | |||
| ejected unnecessarily. VPNs consisting of multiple flows can tend to | ejected unnecessarily. VPNs consisting of multiple flows can tend to | |||
| accumulate queuing score faster than it is aged out, because the | accumulate queuing score faster than it is aged out because the aging | |||
| aging rate is intended for a single flow. However, whether or not | rate is intended for a single flow. However, whether or not some | |||
| some traffic is in a VPN, the queue delay threshold (CRITICALqL) will | traffic is in a VPN, the queue delay threshold (CRITICALqL) will be | |||
| be no more likely to be exceeded. So conditioning ejection on actual | no more likely to be exceeded. So, conditioning ejection on actual | |||
| harm helps reduce the chance that VPN traffic will be ejected by the | harm helps reduce the chance that VPN traffic will be ejected by the | |||
| QProt function. | QProt function. | |||
| 5.5. Rationale for Reclassification as the Policy Action | 5.5. Rationale for Reclassification as the Policy Action | |||
| When the DOCSIS QProt algorithm deems that it is necessary to eject a | When the DOCSIS QProt algorithm deems that it is necessary to eject a | |||
| packet to protect the Low Latency queue, it redirects the packet to | packet to protect the Low-Latency queue, it redirects the packet to | |||
| the Classic queue. In the Low Latency DOCSIS architecture (as in | the Classic queue. In the Low-Latency DOCSIS architecture (as in | |||
| Coupled DualQ AQMs generally), the Classic queue is expected to | Coupled DualQ AQMs generally), the Classic queue is expected to | |||
| frequently have a larger backlog of packets, caused by classic | frequently have a larger backlog of packets, which is caused by | |||
| congestion controllers interacting with a classic AQM (which has a | classic congestion controllers interacting with a classic AQM (which | |||
| latency target of 10ms) as well as other bursty traffic. | has a latency target of 10 ms) as well as other bursty traffic. | |||
| Therefore, typically, an ejected packet will experience higher | Therefore, typically, an ejected packet will experience higher | |||
| queuing delay than it would otherwise, and it could be re-ordered | queuing delay than it would otherwise, and it could be re-ordered | |||
| within its flow (assuming QProt does not eject all packets of an | within its flow (assuming QProt does not eject all packets of an | |||
| anomalous flow). The mild harm caused to the performance of the | anomalous flow). The mild harm caused to the performance of the | |||
| ejected packet's flow is deliberate. It gives senders a slight | ejected packet's flow is deliberate. It gives senders a slight | |||
| incentive to identify their packets correctly. | incentive to identify their packets correctly. | |||
| If there were no such harm, there would be nothing to prevent all | If there were no such harm, there would be nothing to prevent all | |||
| flows from identifying themselves as suitable for classification into | flows from identifying themselves as suitable for classification into | |||
| the low latency queue, and just letting QProt sort the resulting | the low-latency queue and just letting QProt sort the resulting | |||
| aggregate into queue-building and non-queue-building flows. This | aggregate into queue-building and non-queue-building flows. This | |||
| might seem like a useful alternative to requiring senders to | might seem like a useful alternative to requiring senders to | |||
| correctly identify their flows. However, handling of mis-classified | correctly identify their flows. However, handling of mis-classified | |||
| flows is not without a cost. The more packets that have to be | flows is not without a cost. The more packets that have to be | |||
| reclassified, the more often the delay of the low latency queue would | reclassified, the more often the delay of the low-latency queue would | |||
| exceed the threshold. Also more memory would be required to hold the | exceed the threshold. Also, more memory would be required to hold | |||
| extra flow state. | the extra flow state. | |||
| When a packet is redirected into the Classic queue, an operator might | When a packet is redirected into the Classic queue, an operator might | |||
| want to alter the identifier(s) that originally caused it to be | want to alter the identifier(s) that originally caused it to be | |||
| classified into the Low Latency queue, so that the packet will not be | classified into the Low-Latency queue so that the packet will not be | |||
| classified into another low latency queue further downstream. | classified into another low-latency queue further downstream. | |||
| However, redirection of occasional packets can be due to unusually | However, redirection of occasional packets can be due to unusually | |||
| high transient load just at the specific bottleneck, not necessarily | high transient load just at the specific bottleneck, not necessarily | |||
| at any other bottleneck, and not necessarily due to bad flow | at any other bottleneck and not necessarily due to bad flow behavior. | |||
| behaviour. Therefore, Section 5.4.1.2 of [RFC9331] precludes a | Therefore, Section 5.4.1.2 of [RFC9331] precludes a network node from | |||
| network node from altering the end-to-end ECN field to exclude | altering the end-to-end ECN field to exclude traffic from L4S | |||
| traffic from L4S treatment. Instead a local-use identifier ought to | treatment. Instead a local-use identifier ought to be used (e.g., | |||
| be used (e.g., Diffserv Codepoint or VLAN tag), so that each operator | Diffserv Codepoint or VLAN tag) so that each operator can apply its | |||
| can apply its own policy, without prejudging what other operators | own policy, without prejudging what other operators ought to do. | |||
| ought to do. | ||||
| Although not supported in the DOCSIS specs, QProt could be extended | Although not supported in the DOCSIS specifications, QProt could be | |||
| to recognize that large numbers of redirected packets belong to the | extended to recognize that large numbers of redirected packets belong | |||
| same flow. This might be detected when the bucket expiry time t_exp | to the same flow. This might be detected when the bucket expiry time | |||
| exceeds a threshold. Depending on policy and implementation | t_exp exceeds a threshold. Depending on policy and implementation | |||
| capabilities, QProt could then install a classifier to redirect a | capabilities, QProt could then install a classifier to redirect a | |||
| whole flow into the Classic queue, with an idle timeout to remove | whole flow into the Classic queue, with an idle timeout to remove | |||
| stale classifiers. In these 'persistent offender' cases, QProt might | stale classifiers. In these "persistent offender" cases, QProt might | |||
| also overwrite each redirected packet's DSCP or clear its ECN field | also overwrite each redirected packet's DSCP or clear its ECN field | |||
| to Not-ECT, in order to protect other potential L4S queues | to Not-ECT, in order to protect other potential L4S queues | |||
| downstream. The DOCSIS specs do not discuss sanctioning whole flows, | downstream. The DOCSIS specifications do not discuss sanctioning | |||
| so further discussion is beyond the scope of the present document. | whole flows; further discussion is beyond the scope of the present | |||
| document. | ||||
| 6. Limitations | 6. Limitations | |||
| The QProt algorithm groups packets with common layer-4 flow | The QProt algorithm groups packets with common Layer 4 flow | |||
| identifiers. It then uses this grouping to accumulate queuing scores | identifiers. It then uses this grouping to accumulate queuing scores | |||
| and to sanction packets. | and to sanction packets. | |||
| This choice of identifier for grouping is pragmatic with no | This choice of identifier for grouping is pragmatic with no | |||
| scientific basis. All the packets of a flow certainly pass between | scientific basis. All the packets of a flow certainly pass between | |||
| the same two endpoints. But some applications might initiate | the same two endpoints. However, some applications might initiate | |||
| multiple flows between the same end-points, e.g., for media, control, | multiple flows between the same endpoints, e.g., for media, control, | |||
| data, etc. Others might use common flow identifiers for all these | data, etc. Others might use common flow identifiers for all these | |||
| streams. Also, a user might group multiple application flows within | streams. Also, a user might group multiple application flows within | |||
| the same encrypted VPN between the same layer-4 tunnel end-points. | the same encrypted VPN between the same Layer 4 tunnel endpoints. | |||
| And even if there were a one-to-one mapping between flows and | And, even if there were a one-to-one mapping between flows and | |||
| applications, there is no reason to believe that the rate at which | applications, there is no reason to believe that the rate at which | |||
| congestion can be caused ought to be allocated on a per application | congestion can be caused ought to be allocated on a per-application- | |||
| flow basis. | flow basis. | |||
| The use of a queuing score that excludes those aspects of flow rate | The use of a queuing score that excludes those aspects of flow rate | |||
| that do not contribute to queuing (Section 5.1) goes some way to | that do not contribute to queuing (Section 5.1) goes some way to | |||
| mitigating this limitation, because the algorithm does not judge | mitigating this limitation because the algorithm does not judge | |||
| responsibility for queuing delay primarily on the combined rate of a | responsibility for queuing delay primarily on the combined rate of a | |||
| set of flows grouped under one flow ID. | set of flows grouped under one flow ID. | |||
| 7. IANA Considerations (to be removed by RFC Editor) | 7. IANA Considerations | |||
| This specification contains no IANA considerations. | This document has no IANA actions. | |||
| 8. Implementation Status | 8. Implementation Status | |||
| +================+================================================+ | +================+================================================+ | |||
| | Implementation | DOCSIS models for ns-3 | | | Implementation | DOCSIS models for ns-3 | | |||
| | name: | | | | name: | | | |||
| +================+================================================+ | +================+================================================+ | |||
| | Organization | CableLabs | | | Organization | CableLabs | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Web page | https://apps.nsnam.org/app/docsis-ns3/ | | | Web page | https://apps.nsnam.org/app/docsis-ns3/ | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Description | ns-3 simulation models developed and used in | | | Description | ns-3 simulation models developed and used in | | |||
| | | support of the Low Latency DOCSIS development, | | | | support of the Low-Latency DOCSIS development, | | |||
| | | including models of Dual Queue Coupled AQM, | | | | including models of Dual Queue Coupled AQM, | | |||
| | | Queue Protection, and the DOCSIS MAC | | | | Queue Protection, and the DOCSIS MAC | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Maturity | Simulation models that can also be used in | | | Maturity | Simulation models that can also be used in | | |||
| | | emulation mode in a testbed context | | | | emulation mode in a testbed context | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Coverage | Complete implementation of Annex P of DOCSIS | | | Coverage | Complete implementation of Annex P of DOCSIS | | |||
| | | 3.1 | | | | 3.1 | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Version | DOCSIS 3.1, version I21; | | | Version | DOCSIS 3.1, version I21; | | |||
| | | https://www.cablelabs.com/specifications/CM- | | | | https://www.cablelabs.com/specifications/CM- | | |||
| | | SP-MULPIv3.1?v=I21 | | | | SP-MULPIv3.1?v=I21 | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Licence | GPLv2 | | | Licence | GPLv2 | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Contact | via web page | | | Contact | via web page | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Last Impl'n | Mar 2022 | | | Last | Mar 2022 | | |||
| | update | | | | Implementation | | | |||
| | Update | | | ||||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| | Information | 7 Mar 2022 | | | Information | 7 Mar 2022 | | |||
| | valid at | | | | valid at | | | |||
| +----------------+------------------------------------------------+ | +----------------+------------------------------------------------+ | |||
| Table 1 | Table 1 | |||
| There are also a number of closed source implementations, including 2 | There are also a number of closed source implementations, including | |||
| cable modem implementations written by different chipset | two cable modem implementations written by different chipset | |||
| manufacturers, and one CMTS implementation by a third manufacturer. | manufacturers and several CMTS implementations by other | |||
| These, as well as the ns-3 implementation, have passed the full suite | manufacturers. These, as well as the ns-3 implementation, have | |||
| of compliance tests developed by CableLabs. | passed the full suite of compliance tests developed by CableLabs. | |||
| 9. Security Considerations | 9. Security Considerations | |||
| The whole of this document concerns traffic security. It considers | The whole of this document concerns traffic security. It considers | |||
| the security question of how to identify and eject traffic that does | the security question of how to identify and eject traffic that does | |||
| not comply with the non-queue-building behaviour required to use a | not comply with the non-queue-building behavior required to use a | |||
| shared low latency queue, whether accidentally or maliciously. | shared low-latency queue, whether accidentally or maliciously. | |||
| Section 8.2 of the L4S architecture [RFC9330] introduces the problem | Section 8.2 of [RFC9330] of the L4S architecture [RFC9330] introduces | |||
| of maintaining low latency by either self-restraint or enforcement, | the problem of maintaining low latency by either self-restraint or | |||
| and places DOCSIS queue protection in context within a wider set of | enforcement and places DOCSIS queue protection in context within a | |||
| approaches to the problem. | wider set of approaches to the problem. | |||
| 9.1. Resource Exhaustion Attacks | 9.1. Resource Exhaustion Attacks | |||
| The algorithm has been designed to fail gracefully in the face of | The algorithm has been designed to fail gracefully in the face of | |||
| traffic crafted to overrun the resources used for the algorithm's own | traffic crafted to overrun the resources used for the algorithm's own | |||
| processing and flow state. This means that non-queue-building flows | processing and flow state. This means that non-queue-building flows | |||
| will always be less likely to be sanctioned than queue-building | will always be less likely to be sanctioned than queue-building | |||
| flows. But an attack could be contrived to deplete resources in such | flows. But an attack could be contrived to deplete resources in such | |||
| a way that the proportion of innocent (non-queue-building) flows that | a way that the proportion of innocent (non-queue-building) flows that | |||
| are incorrectly sanctioned could increase. | are incorrectly sanctioned could increase. | |||
| Incorrect sanctioning is intended not to be catastrophic; it results | Incorrect sanctioning is intended not to be catastrophic; it results | |||
| in more packets from well-behaved flows being redirected into the | in more packets from well-behaved flows being redirected into the | |||
| Classic queue; thus introducing more reordering into innocent flows. | Classic queue, which introduces more reordering into innocent flows. | |||
| 9.1.1. Exhausting Flow-State Storage | 9.1.1. Exhausting Flow-State Storage | |||
| To exhaust the number of buckets, the most efficient attack is to | To exhaust the number of buckets, the most efficient attack is to | |||
| send enough long-running attack flows to increase the chance that an | send enough long-running attack flows to increase the chance that an | |||
| arriving flow will not find an available bucket, and therefore have | arriving flow will not find an available bucket and will, therefore, | |||
| to share the 'dregs' bucket. For instance, if ATTEMPTS=2 and | have to share the "dregs" bucket. For instance, if ATTEMPTS=2 and | |||
| NBUCKETS=32, it requires about 94 attack flows, each using different | NBUCKETS=32, it requires about 94 attack flows, each using different | |||
| port numbers, to increase the probability to 99% that an arriving | port numbers, to increase the probability to 99% that an arriving | |||
| flow will have to share the dregs, where it will share a high degree | flow will have to share the dregs, where it will share a high degree | |||
| of redirection into the C queue with the remainder of the attack | of redirection into the C queue with the remainder of the attack | |||
| flows. | flows. | |||
| For an attacker to keep buckets busy, it is more efficient to hold | For an attacker to keep buckets busy, it is more efficient to hold | |||
| onto them by cycling regularly through a set of port numbers (94 in | onto them by cycling regularly through a set of port numbers (94 in | |||
| the above example), rather than to keep occupying and releasing | the above example) rather than to keep occupying and releasing | |||
| buckets with single packet flows across a much larger number of | buckets with single packet flows across a much larger number of | |||
| ports. | ports. | |||
| During such an attack, the coupled marking probability will have | During such an attack, the coupled marking probability will have | |||
| saturated at 100%. So, to hold a bucket, the rate of an attack flow | saturated at 100%. So, to hold a bucket, the rate of an attack flow | |||
| needs to be no less than the AGING rate of each bucket; 4Mb/s by | needs to be no less than the AGING rate of each bucket: 4 Mb/s by | |||
| default. However, for an attack flow to be sure to hold on to its | default. However, for an attack flow to be sure to hold on to its | |||
| bucket, it would need to send somewhat faster. Thus an attack with | bucket, it would need to send somewhat faster. Thus, an attack with | |||
| 100 flows would need a total force of more than 100 * 4Mb/s = 400Mb/ | 100 flows would need a total force of more than 100 * 4 Mb/s = 400 | |||
| s. | Mb/s. | |||
| This attack can be mitigated (but not prevented) by increasing the | This attack can be mitigated (but not prevented) by increasing the | |||
| number of buckets. The required attack force scales linearly with | number of buckets. The required attack force scales linearly with | |||
| the number of buckets, NBUCKETS. So, if NBUCKETS were doubled to 64, | the number of buckets, NBUCKETS. So, if NBUCKETS were doubled to 64, | |||
| twice as many 4Mb/s flows would be needed to maintain the same impact | twice as many 4 Mb/s flows would be needed to maintain the same | |||
| on innocent flows. | impact on innocent flows. | |||
| Probably the most effective mitigation would be to implement | Probably the most effective mitigation would be to implement | |||
| redirection of whole-flows once enough of the individual packets of a | redirection of whole-flows once enough of the individual packets of a | |||
| certain offending flow had been redirected. This would free up the | certain offending flow had been redirected. This would free up the | |||
| buckets used to maintain the per-packet queuing score of persistent | buckets used to maintain the per-packet queuing score of persistent | |||
| offenders. Whole-flow redirection is outside the scope of the | offenders. Whole-flow redirection is outside the scope of the | |||
| current version of the QProt algorithm specified here, but it is | current version of the QProt algorithm specified here, but it is | |||
| briefly discussed at the end of Section 5.5. | briefly discussed at the end of Section 5.5. | |||
| It might be considered that all the packets of persistently offending | It might be considered that all the packets of persistently offending | |||
| flows ought to be discarded rather than redirected. However, this is | flows ought to be discarded rather than redirected. However, this is | |||
| not recommended, because attack flows might be able to pervert whole- | not recommended because attack flows might be able to pervert whole- | |||
| flow discard, turning it onto at least some innocent flows, thus | flow discard, turning it onto at least some innocent flows, thus | |||
| amplifying an attack that causes reordering into total deletion of | amplifying an attack that causes reordering into total deletion of | |||
| some innocent flows. | some innocent flows. | |||
| 9.1.2. Exhausting Processing Resources | 9.1.2. Exhausting Processing Resources | |||
| The processing time needed to apply the QProt algorithm to each LL | The processing time needed to apply the QProt algorithm to each LL | |||
| packet is small and intended not to take all the time available | packet is small and intended not to take all the time available | |||
| between each of a run of fairly small packets. However, an attack | between each of a run of fairly small packets. However, an attack | |||
| could use minimum size packets launched from multiple input | could use minimum sized packets launched from multiple input | |||
| interfaces into a lower capacity output interface. Whether the QProt | interfaces into a lower capacity output interface. Whether the QProt | |||
| algorithm is vulnerable to processor exhaustion will depend on the | algorithm is vulnerable to processor exhaustion will depend on the | |||
| specific implementation. | specific implementation. | |||
| Addition of a capability to redirect persistently offending flows | Addition of a capability to redirect persistently offending flows | |||
| from LL to C would be the most effective way to reduce the per-packet | from LL to C would be the most effective way to reduce the per-packet | |||
| processing cost of the QProt algorithm, when under attack. As | processing cost of the QProt algorithm when under attack. As already | |||
| already mentioned in Section 9.1.1, this would also be an effective | mentioned in Section 9.1.1, this would also be an effective way to | |||
| way to mitigate flow-state exhaustion attacks. Further discussion of | mitigate flow-state exhaustion attacks. Further discussion of whole- | |||
| whole-flow redirection is outside the scope of the present document, | flow redirection is outside the scope of the present document but is | |||
| but briefly discussed at the end of Section 5.5. | briefly discussed at the end of Section 5.5. | |||
| 10. Comments Solicited | 10. Comments Solicited | |||
| Evaluation and assessment of the algorithm by researchers is | Evaluation and assessment of the algorithm by researchers is | |||
| solicited. Comments and questions are also encouraged and welcome. | solicited. Comments and questions are also encouraged and welcome. | |||
| They can be addressed to the authors. | They can be addressed to the authors. | |||
| 11. Acknowledgements | 11. References | |||
| Thanks to Tom Henderson, Magnus Westerlund, David Black, Adrian | ||||
| Farrel and Gorry Fairhurst for their reviews of this document. The | ||||
| design of the QProt algorithm and the settings of the parameters | ||||
| benefited from discussion and critique from the participants of the | ||||
| cable industry working group on Low Latency DOCSIS. CableLabs funded | ||||
| Bob Briscoe's initial work on this document. | ||||
| 12. References | ||||
| 12.1. Normative References | 11.1. Normative References | |||
| [DOCSIS] CableLabs, "MAC and Upper Layer Protocols Interface | [DOCSIS] CableLabs, "MAC and Upper Layer Protocols Interface | |||
| (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable | (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable | |||
| Service Interface Specifications DOCSIS® 3.1 Version I17 | Service Interface Specifications DOCSIS(r) 3.1 Version I17 | |||
| or later, 21 January 2019, <https://specification- | or later, 21 January 2019, | |||
| search.cablelabs.com/CM-SP-MULPIv3.1>. | <https://www.cablelabs.com/specifications/CM-SP- | |||
| MULPIv3.1>. | ||||
| [DOCSIS-CCAP-OSS] | [DOCSIS-CCAP-OSS] | |||
| CableLabs, "CCAP Operations Support System Interface | CableLabs, "CCAP Operations Support System Interface | |||
| Spec", Data-Over-Cable Service Interface Specifications | Specification", Data-Over-Cable Service Interface | |||
| DOCSIS® 3.1 Version I14 or later, 21 January 2019, | Specifications DOCSIS(r) 3.1 Version I14 or later, 7 | |||
| <https://specification-search.cablelabs.com/CM-SP-CM- | February 2019, <https://www.cablelabs.com/specifications/ | |||
| OSSIv3.1>. | CM-SP-CM-OSSIv3.1>. | |||
| [DOCSIS-CM-OSS] | [DOCSIS-CM-OSS] | |||
| CableLabs, "Cable Modem Operations Support System | CableLabs, "Cable Modem Operations Support System | |||
| Interface Spec", Data-Over-Cable Service Interface | Interface Specification", Data-Over-Cable Service | |||
| Specifications DOCSIS® 3.1 Version I14 or later, 21 | Interface Specifications DOCSIS(r) 3.1 Version I14 or | |||
| January 2019, <https://specification-search.cablelabs.com/ | later, 21 January 2019, | |||
| CM-SP-CM-OSSIv3.1>. | <https://www.cablelabs.com/specifications/CM-SP-CCAP- | |||
| OSSIv3.1>. | ||||
| [I-D.ietf-tsvwg-nqb] | ||||
| White, G., Fossati, T., and R. Geib, "A Non-Queue-Building | ||||
| Per-Hop Behavior (NQB PHB) for Differentiated Services", | ||||
| Work in Progress, Internet-Draft, draft-ietf-tsvwg-nqb-21, | ||||
| 7 November 2023, <https://datatracker.ietf.org/doc/html/ | ||||
| draft-ietf-tsvwg-nqb-21>. | ||||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
| <https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
| [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | |||
| Notification (ECN) Experimentation", RFC 8311, | Notification (ECN) Experimentation", RFC 8311, | |||
| DOI 10.17487/RFC8311, January 2018, | DOI 10.17487/RFC8311, January 2018, | |||
| <https://www.rfc-editor.org/info/rfc8311>. | <https://www.rfc-editor.org/info/rfc8311>. | |||
| [RFC9331] De Schepper, K. and B. Briscoe, Ed., "The Explicit | [RFC9331] De Schepper, K. and B. Briscoe, Ed., "The Explicit | |||
| Congestion Notification (ECN) Protocol for Low Latency, | Congestion Notification (ECN) Protocol for Low Latency, | |||
| Low Loss, and Scalable Throughput (L4S)", RFC 9331, | Low Loss, and Scalable Throughput (L4S)", RFC 9331, | |||
| DOI 10.17487/RFC9331, January 2023, | DOI 10.17487/RFC9331, January 2023, | |||
| <https://www.rfc-editor.org/info/rfc9331>. | <https://www.rfc-editor.org/info/rfc9331>. | |||
| 12.2. Informative References | [RFC9956] White, G., Fossati, T., and R. Geib, "A Non-Queue-Building | |||
| Per-Hop Behavior (NQB PHB) for Differentiated Services", | ||||
| RFC 9956, DOI 10.17487/RFC9956, April 2026, | ||||
| <https://www.rfc-editor.org/info/rfc9956>. | ||||
| [BBRv3] Cardwell, N., "TCP BBR v3 Release", github | 11.2. Informative References | |||
| repository; Linux congestion control module, | ||||
| <https://github.com/google/bbr/blob/v3/README.md>. | ||||
| [I-D.briscoe-iccrg-prague-congestion-control] | [BBRv3] "TCP BBR v3 Release", commit 90210de, 18 March 2025, | |||
| De Schepper, K., Tilmans, O., Briscoe, B., and V. Goel, | <https://github.com/google/bbr/blob/v3/README.md>. | |||
| "Prague Congestion Control", Work in Progress, Internet- | ||||
| Draft, draft-briscoe-iccrg-prague-congestion-control-03, | ||||
| 14 October 2023, <https://datatracker.ietf.org/doc/html/ | ||||
| draft-briscoe-iccrg-prague-congestion-control-03>. | ||||
| [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | |||
| DOCSIS: Technology Overview", CableLabs White Paper , | DOCSIS: Technology Overview", CableLabs White Paper, | |||
| February 2019, <https://cablela.bs/low-latency-docsis- | February 2019, <https://cablela.bs/low-latency-docsis- | |||
| technology-overview-february-2019>. | technology-overview-february-2019>. | |||
| [PRAGUE-CC] | ||||
| De Schepper, K., Tilmans, O., Briscoe, B., and V. Goel, | ||||
| "Prague Congestion Control", Work in Progress, Internet- | ||||
| Draft, draft-briscoe-iccrg-prague-congestion-control-04, | ||||
| 24 July 2024, <https://datatracker.ietf.org/doc/html/ | ||||
| draft-briscoe-iccrg-prague-congestion-control-04>. | ||||
| [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", | [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", | |||
| RFC 4303, DOI 10.17487/RFC4303, December 2005, | RFC 4303, DOI 10.17487/RFC4303, December 2005, | |||
| <https://www.rfc-editor.org/info/rfc4303>. | <https://www.rfc-editor.org/info/rfc4303>. | |||
| [RFC6789] Briscoe, B., Ed., Woundy, R., Ed., and A. Cooper, Ed., | [RFC6789] Briscoe, B., Ed., Woundy, R., Ed., and A. Cooper, Ed., | |||
| "Congestion Exposure (ConEx) Concepts and Use Cases", | "Congestion Exposure (ConEx) Concepts and Use Cases", | |||
| RFC 6789, DOI 10.17487/RFC6789, December 2012, | RFC 6789, DOI 10.17487/RFC6789, December 2012, | |||
| <https://www.rfc-editor.org/info/rfc6789>. | <https://www.rfc-editor.org/info/rfc6789>. | |||
| [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) | [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) | |||
| skipping to change at page 32, line 13 ¶ | skipping to change at line 1375 ¶ | |||
| <https://www.rfc-editor.org/info/rfc9330>. | <https://www.rfc-editor.org/info/rfc9330>. | |||
| [RFC9332] De Schepper, K., Briscoe, B., Ed., and G. White, "Dual- | [RFC9332] De Schepper, K., Briscoe, B., Ed., and G. White, "Dual- | |||
| Queue Coupled Active Queue Management (AQM) for Low | Queue Coupled Active Queue Management (AQM) for Low | |||
| Latency, Low Loss, and Scalable Throughput (L4S)", | Latency, Low Loss, and Scalable Throughput (L4S)", | |||
| RFC 9332, DOI 10.17487/RFC9332, January 2023, | RFC 9332, DOI 10.17487/RFC9332, January 2023, | |||
| <https://www.rfc-editor.org/info/rfc9332>. | <https://www.rfc-editor.org/info/rfc9332>. | |||
| [ScalingCC] | [ScalingCC] | |||
| Briscoe, B. and K. De Schepper, "Resolving Tensions | Briscoe, B. and K. De Schepper, "Resolving Tensions | |||
| between Congestion Control Scaling Requirements", Simula | between Congestion Control Scaling Requirements", | |||
| Technical Report TR-CS-2016-001 arXiv:1904.07605, July | arXiv:1904.07605, Simula Technical Report TR-CS-2016-001, | |||
| 2017, <https://arxiv.org/abs/1904.07605>. | DOI 10.48550/arXiv.1904.07605, July 2017, | |||
| <https://arxiv.org/abs/1904.07605>. | ||||
| [SCReAM] Johansson, I., "SCReAM", github repository; , | [SCReAM] "SCReAM", commit 0208f59, 10 November 2025, | |||
| <https://github.com/EricssonResearch/scream/blob/master/ | <https://github.com/EricssonResearch/scream/blob/master/ | |||
| README.md>. | README.md>. | |||
| Acknowledgements | ||||
| Thanks to Tom Henderson, Magnus Westerlund, David Black, Adrian | ||||
| Farrel, and Gorry Fairhurst for their reviews of this document. The | ||||
| design of the QProt algorithm and the settings of the parameters | ||||
| benefited from discussion and critique from the participants of the | ||||
| cable industry working group on Low-Latency DOCSIS. CableLabs funded | ||||
| Bob Briscoe's initial work on this document. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Bob Briscoe (editor) | Bob Briscoe (editor) | |||
| Independent | Independent | |||
| United Kingdom | United Kingdom | |||
| Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
| URI: http://bobbriscoe.net/ | URI: https://bobbriscoe.net/ | |||
| Greg White | Greg White | |||
| CableLabs | CableLabs | |||
| United States of America | United States of America | |||
| Email: G.White@CableLabs.com | Email: G.White@CableLabs.com | |||
| End of changes. 159 change blocks. | ||||
| 415 lines changed or deleted | 428 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||