1 |
3 |
xianfeng |
|
2 |
|
|
HOWTO for multiqueue network device support
|
3 |
|
|
===========================================
|
4 |
|
|
|
5 |
|
|
Section 1: Base driver requirements for implementing multiqueue support
|
6 |
|
|
Section 2: Qdisc support for multiqueue devices
|
7 |
|
|
Section 3: Brief howto using PRIO or RR for multiqueue devices
|
8 |
|
|
|
9 |
|
|
|
10 |
|
|
Intro: Kernel support for multiqueue devices
|
11 |
|
|
---------------------------------------------------------
|
12 |
|
|
|
13 |
|
|
Kernel support for multiqueue devices is only an API that is presented to the
|
14 |
|
|
netdevice layer for base drivers to implement. This feature is part of the
|
15 |
|
|
core networking stack, and all network devices will be running on the
|
16 |
|
|
multiqueue-aware stack. If a base driver only has one queue, then these
|
17 |
|
|
changes are transparent to that driver.
|
18 |
|
|
|
19 |
|
|
|
20 |
|
|
Section 1: Base driver requirements for implementing multiqueue support
|
21 |
|
|
-----------------------------------------------------------------------
|
22 |
|
|
|
23 |
|
|
Base drivers are required to use the new alloc_etherdev_mq() or
|
24 |
|
|
alloc_netdev_mq() functions to allocate the subqueues for the device. The
|
25 |
|
|
underlying kernel API will take care of the allocation and deallocation of
|
26 |
|
|
the subqueue memory, as well as netdev configuration of where the queues
|
27 |
|
|
exist in memory.
|
28 |
|
|
|
29 |
|
|
The base driver will also need to manage the queues as it does the global
|
30 |
|
|
netdev->queue_lock today. Therefore base drivers should use the
|
31 |
|
|
netif_{start|stop|wake}_subqueue() functions to manage each queue while the
|
32 |
|
|
device is still operational. netdev->queue_lock is still used when the device
|
33 |
|
|
comes online or when it's completely shut down (unregister_netdev(), etc.).
|
34 |
|
|
|
35 |
|
|
Finally, the base driver should indicate that it is a multiqueue device. The
|
36 |
|
|
feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
|
37 |
|
|
bitmap on device initialization. Below is an example from e1000:
|
38 |
|
|
|
39 |
|
|
#ifdef CONFIG_E1000_MQ
|
40 |
|
|
if ( (adapter->hw.mac.type == e1000_82571) ||
|
41 |
|
|
(adapter->hw.mac.type == e1000_82572) ||
|
42 |
|
|
(adapter->hw.mac.type == e1000_80003es2lan))
|
43 |
|
|
netdev->features |= NETIF_F_MULTI_QUEUE;
|
44 |
|
|
#endif
|
45 |
|
|
|
46 |
|
|
|
47 |
|
|
Section 2: Qdisc support for multiqueue devices
|
48 |
|
|
-----------------------------------------------
|
49 |
|
|
|
50 |
|
|
Currently two qdiscs support multiqueue devices. A new round-robin qdisc,
|
51 |
|
|
sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
|
52 |
|
|
bands and queues, and will store the queue mapping into skb->queue_mapping.
|
53 |
|
|
Use this field in the base driver to determine which queue to send the skb
|
54 |
|
|
to.
|
55 |
|
|
|
56 |
|
|
sch_rr has been added for hardware that doesn't want scheduling policies from
|
57 |
|
|
software, so it's a straight round-robin qdisc. It uses the same syntax and
|
58 |
|
|
classification priomap that sch_prio uses, so it should be intuitive to
|
59 |
|
|
configure for people who've used sch_prio.
|
60 |
|
|
|
61 |
|
|
In order to utilitize the multiqueue features of the qdiscs, the network
|
62 |
|
|
device layer needs to enable multiple queue support. This can be done by
|
63 |
|
|
selecting NETDEVICES_MULTIQUEUE under Drivers.
|
64 |
|
|
|
65 |
|
|
The PRIO qdisc naturally plugs into a multiqueue device. If
|
66 |
|
|
NETDEVICES_MULTIQUEUE is selected, then on qdisc load, the number of
|
67 |
|
|
bands requested is compared to the number of queues on the hardware. If they
|
68 |
|
|
are equal, it sets a one-to-one mapping up between the queues and bands. If
|
69 |
|
|
they're not equal, it will not load the qdisc. This is the same behavior
|
70 |
|
|
for RR. Once the association is made, any skb that is classified will have
|
71 |
|
|
skb->queue_mapping set, which will allow the driver to properly queue skb's
|
72 |
|
|
to multiple queues.
|
73 |
|
|
|
74 |
|
|
|
75 |
|
|
Section 3: Brief howto using PRIO and RR for multiqueue devices
|
76 |
|
|
---------------------------------------------------------------
|
77 |
|
|
|
78 |
|
|
The userspace command 'tc,' part of the iproute2 package, is used to configure
|
79 |
|
|
qdiscs. To add the PRIO qdisc to your network device, assuming the device is
|
80 |
|
|
called eth0, run the following command:
|
81 |
|
|
|
82 |
|
|
# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue
|
83 |
|
|
|
84 |
|
|
This will create 4 bands, 0 being highest priority, and associate those bands
|
85 |
|
|
to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping
|
86 |
|
|
would look like:
|
87 |
|
|
|
88 |
|
|
band 0 => queue 0
|
89 |
|
|
band 1 => queue 1
|
90 |
|
|
band 2 => queue 2
|
91 |
|
|
band 3 => queue 3
|
92 |
|
|
|
93 |
|
|
Traffic will begin flowing through each queue if your TOS values are assigning
|
94 |
|
|
traffic across the various bands. For example, ssh traffic will always try to
|
95 |
|
|
go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
|
96 |
|
|
so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal"
|
97 |
|
|
traffic classification, which is band 1. Therefore pings will be send out
|
98 |
|
|
queue 1 on the NIC.
|
99 |
|
|
|
100 |
|
|
Note the use of the multiqueue keyword. This is only in versions of iproute2
|
101 |
|
|
that support multiqueue networking devices; if this is omitted when loading
|
102 |
|
|
a qdisc onto a multiqueue device, the qdisc will load and operate the same
|
103 |
|
|
if it were loaded onto a single-queue device (i.e. - sends all traffic to
|
104 |
|
|
queue 0).
|
105 |
|
|
|
106 |
|
|
Another alternative to multiqueue band allocation can be done by using the
|
107 |
|
|
multiqueue option and specify 0 bands. If this is the case, the qdisc will
|
108 |
|
|
allocate the number of bands to equal the number of queues that the device
|
109 |
|
|
reports, and bring the qdisc online.
|
110 |
|
|
|
111 |
|
|
The behavior of tc filters remains the same, where it will override TOS priority
|
112 |
|
|
classification.
|
113 |
|
|
|
114 |
|
|
|
115 |
|
|
Author: Peter P. Waskiewicz Jr.
|