1 |
1275 |
phoenix |
|
2 |
|
|
Linux Ethernet Bonding Driver mini-howto
|
3 |
|
|
|
4 |
|
|
Initial release : Thomas Davis
|
5 |
|
|
Corrections, HA extensions : 2000/10/03-15 :
|
6 |
|
|
- Willy Tarreau
|
7 |
|
|
- Constantine Gavrilov
|
8 |
|
|
- Chad N. Tindel
|
9 |
|
|
- Janice Girouard
|
10 |
|
|
- Jay Vosburgh
|
11 |
|
|
|
12 |
|
|
Note :
|
13 |
|
|
------
|
14 |
|
|
The bonding driver originally came from Donald Becker's beowulf patches for
|
15 |
|
|
kernel 2.0. It has changed quite a bit since, and the original tools from
|
16 |
|
|
extreme-linux and beowulf sites will not work with this version of the driver.
|
17 |
|
|
|
18 |
|
|
For new versions of the driver, patches for older kernels and the updated
|
19 |
|
|
userspace tools, please follow the links at the end of this file.
|
20 |
|
|
|
21 |
|
|
|
22 |
|
|
Table of Contents
|
23 |
|
|
=================
|
24 |
|
|
|
25 |
|
|
Installation
|
26 |
|
|
Bond Configuration
|
27 |
|
|
Module Parameters
|
28 |
|
|
Configuring Multiple Bonds
|
29 |
|
|
Switch Configuration
|
30 |
|
|
Verifying Bond Configuration
|
31 |
|
|
Frequently Asked Questions
|
32 |
|
|
High Availability
|
33 |
|
|
Promiscuous Sniffing notes
|
34 |
|
|
8021q VLAN support
|
35 |
|
|
Limitations
|
36 |
|
|
Resources and Links
|
37 |
|
|
|
38 |
|
|
|
39 |
|
|
Installation
|
40 |
|
|
============
|
41 |
|
|
|
42 |
|
|
1) Build kernel with the bonding driver
|
43 |
|
|
---------------------------------------
|
44 |
|
|
For the latest version of the bonding driver, use kernel 2.4.12 or above
|
45 |
|
|
(otherwise you will need to apply a patch).
|
46 |
|
|
|
47 |
|
|
Configure kernel with `make menuconfig/xconfig/config', and select "Bonding
|
48 |
|
|
driver support" in the "Network device support" section. It is recommended
|
49 |
|
|
to configure the driver as module since it is currently the only way to
|
50 |
|
|
pass parameters to the driver and configure more than one bonding device.
|
51 |
|
|
|
52 |
|
|
Build and install the new kernel and modules.
|
53 |
|
|
|
54 |
|
|
2) Get and install the userspace tools
|
55 |
|
|
--------------------------------------
|
56 |
|
|
This version of the bonding driver requires updated ifenslave program. The
|
57 |
|
|
original one from extreme-linux and beowulf will not work. Kernels 2.4.12
|
58 |
|
|
and above include the updated version of ifenslave.c in Documentation/network
|
59 |
|
|
directory. For older kernels, please follow the links at the end of this file.
|
60 |
|
|
|
61 |
|
|
IMPORTANT!!! If you are running on Redhat 7.1 or greater, you need
|
62 |
|
|
to be careful because /usr/include/linux is no longer a symbolic link
|
63 |
|
|
to /usr/src/linux/include/linux. If you build ifenslave while this is
|
64 |
|
|
true, ifenslave will appear to succeed but your bond won't work. The purpose
|
65 |
|
|
of the -I option on the ifenslave compile line is to make sure it uses
|
66 |
|
|
/usr/src/linux/include/linux/if_bonding.h instead of the version from
|
67 |
|
|
/usr/include/linux.
|
68 |
|
|
|
69 |
|
|
To install ifenslave.c, do:
|
70 |
|
|
# gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave
|
71 |
|
|
# cp ifenslave /sbin/ifenslave
|
72 |
|
|
|
73 |
|
|
|
74 |
|
|
Bond Configuration
|
75 |
|
|
==================
|
76 |
|
|
|
77 |
|
|
You will need to add at least the following line to /etc/modules.conf
|
78 |
|
|
so the bonding driver will automatically load when the bond0 interface is
|
79 |
|
|
configured. Refer to the modules.conf manual page for specific modules.conf
|
80 |
|
|
syntax details. The Module Parameters section of this document describes each
|
81 |
|
|
bonding driver parameter.
|
82 |
|
|
|
83 |
|
|
alias bond0 bonding
|
84 |
|
|
|
85 |
|
|
Use standard distribution techniques to define the bond0 network interface. For
|
86 |
|
|
example, on modern Red Hat distributions, create an ifcfg-bond0 file in
|
87 |
|
|
the /etc/sysconfig/network-scripts directory that resembles the following:
|
88 |
|
|
|
89 |
|
|
DEVICE=bond0
|
90 |
|
|
IPADDR=192.168.1.1
|
91 |
|
|
NETMASK=255.255.255.0
|
92 |
|
|
NETWORK=192.168.1.0
|
93 |
|
|
BROADCAST=192.168.1.255
|
94 |
|
|
ONBOOT=yes
|
95 |
|
|
BOOTPROTO=none
|
96 |
|
|
USERCTL=no
|
97 |
|
|
|
98 |
|
|
(use appropriate values for your network above)
|
99 |
|
|
|
100 |
|
|
All interfaces that are part of a bond should have SLAVE and MASTER
|
101 |
|
|
definitions. For example, in the case of Red Hat, if you wish to make eth0 and
|
102 |
|
|
eth1 a part of the bonding interface bond0, their config files (ifcfg-eth0 and
|
103 |
|
|
ifcfg-eth1) should resemble the following:
|
104 |
|
|
|
105 |
|
|
DEVICE=eth0
|
106 |
|
|
USERCTL=no
|
107 |
|
|
ONBOOT=yes
|
108 |
|
|
MASTER=bond0
|
109 |
|
|
SLAVE=yes
|
110 |
|
|
BOOTPROTO=none
|
111 |
|
|
|
112 |
|
|
Use DEVICE=eth1 in the ifcfg-eth1 config file. If you configure a second
|
113 |
|
|
bonding interface (bond1), use MASTER=bond1 in the config file to make the
|
114 |
|
|
network interface be a slave of bond1.
|
115 |
|
|
|
116 |
|
|
Restart the networking subsystem or just bring up the bonding device if your
|
117 |
|
|
administration tools allow it. Otherwise, reboot. On Red Hat distros you can
|
118 |
|
|
issue `ifup bond0' or `/etc/rc.d/init.d/network restart'.
|
119 |
|
|
|
120 |
|
|
If the administration tools of your distribution do not support
|
121 |
|
|
master/slave notation in configuring network interfaces, you will need to
|
122 |
|
|
manually configure the bonding device with the following commands:
|
123 |
|
|
|
124 |
|
|
# /sbin/ifconfig bond0 192.168.1.1 netmask 255.255.255.0 \
|
125 |
|
|
broadcast 192.168.1.255 up
|
126 |
|
|
|
127 |
|
|
# /sbin/ifenslave bond0 eth0
|
128 |
|
|
# /sbin/ifenslave bond0 eth1
|
129 |
|
|
|
130 |
|
|
(use appropriate values for your network above)
|
131 |
|
|
|
132 |
|
|
You can then create a script containing these commands and place it in the
|
133 |
|
|
appropriate rc directory.
|
134 |
|
|
|
135 |
|
|
If you specifically need all network drivers loaded before the bonding driver,
|
136 |
|
|
adding the following line to modules.conf will cause the network driver for
|
137 |
|
|
eth0 and eth1 to be loaded before the bonding driver.
|
138 |
|
|
|
139 |
|
|
probeall bond0 eth0 eth1 bonding
|
140 |
|
|
|
141 |
|
|
Be careful not to reference bond0 itself at the end of the line, or modprobe
|
142 |
|
|
will die in an endless recursive loop.
|
143 |
|
|
|
144 |
|
|
If running SNMP agents, the bonding driver should be loaded before any network
|
145 |
|
|
drivers participating in a bond. This requirement is due to the the interface
|
146 |
|
|
index (ipAdEntIfIndex) being associated to the first interface found with a
|
147 |
|
|
given IP address. That is, there is only one ipAdEntIfIndex for each IP
|
148 |
|
|
address. For example, if eth0 and eth1 are slaves of bond0 and the driver for
|
149 |
|
|
eth0 is loaded before the bonding driver, the interface for the IP address
|
150 |
|
|
will be associated with the eth0 interface. This configuration is shown below,
|
151 |
|
|
the IP address 192.168.1.1 has an interface index of 2 which indexes to eth0
|
152 |
|
|
in the ifDescr table (ifDescr.2).
|
153 |
|
|
|
154 |
|
|
interfaces.ifTable.ifEntry.ifDescr.1 = lo
|
155 |
|
|
interfaces.ifTable.ifEntry.ifDescr.2 = eth0
|
156 |
|
|
interfaces.ifTable.ifEntry.ifDescr.3 = eth1
|
157 |
|
|
interfaces.ifTable.ifEntry.ifDescr.4 = eth2
|
158 |
|
|
interfaces.ifTable.ifEntry.ifDescr.5 = eth3
|
159 |
|
|
interfaces.ifTable.ifEntry.ifDescr.6 = bond0
|
160 |
|
|
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
|
161 |
|
|
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
|
162 |
|
|
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
|
163 |
|
|
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
|
164 |
|
|
|
165 |
|
|
This problem is avoided by loading the bonding driver before any network
|
166 |
|
|
drivers participating in a bond. Below is an example of loading the bonding
|
167 |
|
|
driver first, the IP address 192.168.1.1 is correctly associated with
|
168 |
|
|
ifDescr.2.
|
169 |
|
|
|
170 |
|
|
interfaces.ifTable.ifEntry.ifDescr.1 = lo
|
171 |
|
|
interfaces.ifTable.ifEntry.ifDescr.2 = bond0
|
172 |
|
|
interfaces.ifTable.ifEntry.ifDescr.3 = eth0
|
173 |
|
|
interfaces.ifTable.ifEntry.ifDescr.4 = eth1
|
174 |
|
|
interfaces.ifTable.ifEntry.ifDescr.5 = eth2
|
175 |
|
|
interfaces.ifTable.ifEntry.ifDescr.6 = eth3
|
176 |
|
|
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
|
177 |
|
|
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
|
178 |
|
|
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
|
179 |
|
|
ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
|
180 |
|
|
|
181 |
|
|
While some distributions may not report the interface name in ifDescr,
|
182 |
|
|
the association between the IP address and IfIndex remains and SNMP
|
183 |
|
|
functions such as Interface_Scan_Next will report that association.
|
184 |
|
|
|
185 |
|
|
|
186 |
|
|
Module Parameters
|
187 |
|
|
=================
|
188 |
|
|
|
189 |
|
|
Optional parameters for the bonding driver can be supplied as command line
|
190 |
|
|
arguments to the insmod command. Typically, these parameters are specified in
|
191 |
|
|
the file /etc/modules.conf (see the manual page for modules.conf). The
|
192 |
|
|
available bonding driver parameters are listed below. If a parameter is not
|
193 |
|
|
specified the default value is used. When initially configuring a bond, it
|
194 |
|
|
is recommended "tail -f /var/log/messages" be run in a separate window to
|
195 |
|
|
watch for bonding driver error messages.
|
196 |
|
|
|
197 |
|
|
It is critical that either the miimon or arp_interval and arp_ip_target
|
198 |
|
|
parameters be specified, otherwise serious network degradation will occur
|
199 |
|
|
during link failures.
|
200 |
|
|
|
201 |
|
|
arp_interval
|
202 |
|
|
|
203 |
|
|
Specifies the ARP monitoring frequency in milli-seconds.
|
204 |
|
|
If ARP monitoring is used in a load-balancing mode (mode 0 or 2), the
|
205 |
|
|
switch should be configured in a mode that evenly distributes packets
|
206 |
|
|
across all links - such as round-robin. If the switch is configured to
|
207 |
|
|
distribute the packets in an XOR fashion, all replies from the ARP
|
208 |
|
|
targets will be received on the same link which could cause the other
|
209 |
|
|
team members to fail. ARP monitoring should not be used in conjunction
|
210 |
|
|
with miimon. A value of 0 disables ARP monitoring. The default value
|
211 |
|
|
is 0.
|
212 |
|
|
|
213 |
|
|
arp_ip_target
|
214 |
|
|
|
215 |
|
|
Specifies the ip addresses to use when arp_interval is > 0. These
|
216 |
|
|
are the targets of the ARP request sent to determine the health of
|
217 |
|
|
the link to the targets. Specify these values in ddd.ddd.ddd.ddd
|
218 |
|
|
format. Multiple ip adresses must be seperated by a comma. At least
|
219 |
|
|
one ip address needs to be given for ARP monitoring to work. The
|
220 |
|
|
maximum number of targets that can be specified is set at 16.
|
221 |
|
|
|
222 |
|
|
downdelay
|
223 |
|
|
|
224 |
|
|
Specifies the delay time in milli-seconds to disable a link after a
|
225 |
|
|
link failure has been detected. This should be a multiple of miimon
|
226 |
|
|
value, otherwise the value will be rounded. The default value is 0.
|
227 |
|
|
|
228 |
|
|
lacp_rate
|
229 |
|
|
|
230 |
|
|
Option specifying the rate in which we'll ask our link partner to
|
231 |
|
|
transmit LACPDU packets in 802.3ad mode. Possible values are:
|
232 |
|
|
|
233 |
|
|
slow or 0
|
234 |
|
|
Request partner to transmit LACPDUs every 30 seconds (default)
|
235 |
|
|
|
236 |
|
|
fast or 1
|
237 |
|
|
Request partner to transmit LACPDUs every 1 second
|
238 |
|
|
|
239 |
|
|
max_bonds
|
240 |
|
|
|
241 |
|
|
Specifies the number of bonding devices to create for this
|
242 |
|
|
instance of the bonding driver. E.g., if max_bonds is 3, and
|
243 |
|
|
the bonding driver is not already loaded, then bond0, bond1
|
244 |
|
|
and bond2 will be created. The default value is 1.
|
245 |
|
|
|
246 |
|
|
miimon
|
247 |
|
|
|
248 |
|
|
Specifies the frequency in milli-seconds that MII link monitoring
|
249 |
|
|
will occur. A value of zero disables MII link monitoring. A value
|
250 |
|
|
of 100 is a good starting point. See High Availability section for
|
251 |
|
|
additional information. The default value is 0.
|
252 |
|
|
|
253 |
|
|
mode
|
254 |
|
|
|
255 |
|
|
Specifies one of the bonding policies. The default is
|
256 |
|
|
round-robin (balance-rr). Possible values are (you can use
|
257 |
|
|
either the text or numeric option):
|
258 |
|
|
|
259 |
|
|
balance-rr or 0
|
260 |
|
|
|
261 |
|
|
Round-robin policy: Transmit in a sequential order
|
262 |
|
|
from the first available slave through the last. This
|
263 |
|
|
mode provides load balancing and fault tolerance.
|
264 |
|
|
|
265 |
|
|
active-backup or 1
|
266 |
|
|
|
267 |
|
|
Active-backup policy: Only one slave in the bond is
|
268 |
|
|
active. A different slave becomes active if, and only
|
269 |
|
|
if, the active slave fails. The bond's MAC address is
|
270 |
|
|
externally visible on only one port (network adapter)
|
271 |
|
|
to avoid confusing the switch. This mode provides
|
272 |
|
|
fault tolerance.
|
273 |
|
|
|
274 |
|
|
balance-xor or 2
|
275 |
|
|
|
276 |
|
|
XOR policy: Transmit based on [(source MAC address
|
277 |
|
|
XOR'd with destination MAC address) modula slave
|
278 |
|
|
count]. This selects the same slave for each
|
279 |
|
|
destination MAC address. This mode provides load
|
280 |
|
|
balancing and fault tolerance.
|
281 |
|
|
|
282 |
|
|
broadcast or 3
|
283 |
|
|
|
284 |
|
|
Broadcast policy: transmits everything on all slave
|
285 |
|
|
interfaces. This mode provides fault tolerance.
|
286 |
|
|
|
287 |
|
|
802.3ad or 4
|
288 |
|
|
|
289 |
|
|
IEEE 802.3ad Dynamic link aggregation. Creates aggregation
|
290 |
|
|
groups that share the same speed and duplex settings.
|
291 |
|
|
Transmits and receives on all slaves in the active
|
292 |
|
|
aggregator.
|
293 |
|
|
|
294 |
|
|
Pre-requisites:
|
295 |
|
|
|
296 |
|
|
1. Ethtool support in the base drivers for retrieving the
|
297 |
|
|
speed and duplex of each slave.
|
298 |
|
|
|
299 |
|
|
2. A switch that supports IEEE 802.3ad Dynamic link
|
300 |
|
|
aggregation.
|
301 |
|
|
|
302 |
|
|
balance-tlb or 5
|
303 |
|
|
|
304 |
|
|
Adaptive transmit load balancing: channel bonding that does
|
305 |
|
|
not require any special switch support. The outgoing
|
306 |
|
|
traffic is distributed according to the current load
|
307 |
|
|
(computed relative to the speed) on each slave. Incoming
|
308 |
|
|
traffic is received by the current slave. If the receiving
|
309 |
|
|
slave fails, another slave takes over the MAC address of
|
310 |
|
|
the failed receiving slave.
|
311 |
|
|
|
312 |
|
|
Prerequisite:
|
313 |
|
|
|
314 |
|
|
Ethtool support in the base drivers for retrieving the
|
315 |
|
|
speed of each slave.
|
316 |
|
|
|
317 |
|
|
balance-alb or 6
|
318 |
|
|
|
319 |
|
|
Adaptive load balancing: includes balance-tlb + receive
|
320 |
|
|
load balancing (rlb) for IPV4 traffic and does not require
|
321 |
|
|
any special switch support. The receive load balancing is
|
322 |
|
|
achieved by ARP negotiation. The bonding driver intercepts
|
323 |
|
|
the ARP Replies sent by the server on their way out and
|
324 |
|
|
overwrites the src hw address with the unique hw address of
|
325 |
|
|
one of the slaves in the bond such that different clients
|
326 |
|
|
use different hw addresses for the server.
|
327 |
|
|
|
328 |
|
|
Receive traffic from connections created by the server is
|
329 |
|
|
also balanced. When the server sends an ARP Request the
|
330 |
|
|
bonding driver copies and saves the client's IP information
|
331 |
|
|
from the ARP. When the ARP Reply arrives from the client,
|
332 |
|
|
its hw address is retrieved and the bonding driver
|
333 |
|
|
initiates an ARP reply to this client assigning it to one
|
334 |
|
|
of the slaves in the bond. A problematic outcome of using
|
335 |
|
|
ARP negotiation for balancing is that each time that an ARP
|
336 |
|
|
request is broadcasted it uses the hw address of the
|
337 |
|
|
bond. Hence, clients learn the hw address of the bond and
|
338 |
|
|
the balancing of receive traffic collapses to the current
|
339 |
|
|
salve. This is handled by sending updates (ARP Replies) to
|
340 |
|
|
all the clients with their assigned hw address such that
|
341 |
|
|
the traffic is redistributed. Receive traffic is also
|
342 |
|
|
redistributed when a new slave is added to the bond and
|
343 |
|
|
when an inactive slave is re-activated. The receive load is
|
344 |
|
|
distributed sequentially (round robin) among the group of
|
345 |
|
|
highest speed slaves in the bond.
|
346 |
|
|
|
347 |
|
|
When a link is reconnected or a new slave joins the bond
|
348 |
|
|
the receive traffic is redistributed among all active
|
349 |
|
|
slaves in the bond by intiating ARP Replies with the
|
350 |
|
|
selected mac address to each of the clients. The updelay
|
351 |
|
|
modeprobe parameter must be set to a value equal or greater
|
352 |
|
|
than the switch's forwarding delay so that the ARP Replies
|
353 |
|
|
sent to the clients will not be blocked by the switch.
|
354 |
|
|
|
355 |
|
|
Prerequisites:
|
356 |
|
|
|
357 |
|
|
1. Ethtool support in the base drivers for retrieving the
|
358 |
|
|
speed of each slave.
|
359 |
|
|
|
360 |
|
|
2. Base driver support for setting the hw address of a
|
361 |
|
|
device also when it is open. This is required so that there
|
362 |
|
|
will always be one slave in the team using the bond hw
|
363 |
|
|
address (the curr_active_slave) while having a unique hw
|
364 |
|
|
address for each slave in the bond. If the curr_active_slave
|
365 |
|
|
fails it's hw address is swapped with the new curr_active_slave
|
366 |
|
|
that was chosen.
|
367 |
|
|
|
368 |
|
|
primary
|
369 |
|
|
|
370 |
|
|
A string (eth0, eth2, etc) to equate to a primary device. If this
|
371 |
|
|
value is entered, and the device is on-line, it will be used first
|
372 |
|
|
as the output media. Only when this device is off-line, will
|
373 |
|
|
alternate devices be used. Otherwise, once a failover is detected
|
374 |
|
|
and a new default output is chosen, it will remain the output media
|
375 |
|
|
until it too fails. This is useful when one slave was preferred
|
376 |
|
|
over another, i.e. when one slave is 1000Mbps and another is
|
377 |
|
|
100Mbps. If the 1000Mbps slave fails and is later restored, it may
|
378 |
|
|
be preferred the faster slave gracefully become the active slave -
|
379 |
|
|
without deliberately failing the 100Mbps slave. Specifying a
|
380 |
|
|
primary is only valid in active-backup mode.
|
381 |
|
|
|
382 |
|
|
updelay
|
383 |
|
|
|
384 |
|
|
Specifies the delay time in milli-seconds to enable a link after a
|
385 |
|
|
link up status has been detected. This should be a multiple of miimon
|
386 |
|
|
value, otherwise the value will be rounded. The default value is 0.
|
387 |
|
|
|
388 |
|
|
use_carrier
|
389 |
|
|
|
390 |
|
|
Specifies whether or not miimon should use MII or ETHTOOL
|
391 |
|
|
ioctls vs. netif_carrier_ok() to determine the link status.
|
392 |
|
|
The MII or ETHTOOL ioctls are less efficient and utilize a
|
393 |
|
|
deprecated calling sequence within the kernel. The
|
394 |
|
|
netif_carrier_ok() relies on the device driver to maintain its
|
395 |
|
|
state with netif_carrier_on/off; at this writing, most, but
|
396 |
|
|
not all, device drivers support this facility.
|
397 |
|
|
|
398 |
|
|
If bonding insists that the link is up when it should not be,
|
399 |
|
|
it may be that your network device driver does not support
|
400 |
|
|
netif_carrier_on/off. This is because the default state for
|
401 |
|
|
netif_carrier is "carrier on." In this case, disabling
|
402 |
|
|
use_carrier will cause bonding to revert to the MII / ETHTOOL
|
403 |
|
|
ioctl method to determine the link state.
|
404 |
|
|
|
405 |
|
|
A value of 1 enables the use of netif_carrier_ok(), a value of
|
406 |
|
|
|
407 |
|
|
value is 1.
|
408 |
|
|
|
409 |
|
|
|
410 |
|
|
Configuring Multiple Bonds
|
411 |
|
|
==========================
|
412 |
|
|
|
413 |
|
|
If several bonding interfaces are required, either specify the max_bonds
|
414 |
|
|
parameter (described above), or load the driver multiple times. Using
|
415 |
|
|
the max_bonds parameter is less complicated, but has the limitation that
|
416 |
|
|
all bonding instances created will have the same options. Loading the
|
417 |
|
|
driver multiple times allows each instance of the driver to have differing
|
418 |
|
|
options.
|
419 |
|
|
|
420 |
|
|
For example, to configure two bonding interfaces, one with mii link
|
421 |
|
|
monitoring performed every 100 milliseconds, and one with ARP link
|
422 |
|
|
monitoring performed every 200 milliseconds, the /etc/conf.modules should
|
423 |
|
|
resemble the following:
|
424 |
|
|
|
425 |
|
|
alias bond0 bonding
|
426 |
|
|
alias bond1 bonding
|
427 |
|
|
|
428 |
|
|
options bond0 miimon=100
|
429 |
|
|
options bond1 -o bonding1 arp_interval=200 arp_ip_target=10.0.0.1
|
430 |
|
|
|
431 |
|
|
Configuring Multiple ARP Targets
|
432 |
|
|
================================
|
433 |
|
|
|
434 |
|
|
While ARP monitoring can be done with just one target, it can be useful
|
435 |
|
|
in a High Availability setup to have several targets to monitor. In the
|
436 |
|
|
case of just one target, the target itself may go down or have a problem
|
437 |
|
|
making it unresponsive to ARP requests. Having an additional target (or
|
438 |
|
|
several) increases the reliability of the ARP monitoring.
|
439 |
|
|
|
440 |
|
|
Multiple ARP targets must be seperated by commas as follows:
|
441 |
|
|
|
442 |
|
|
# example options for ARP monitoring with three targets
|
443 |
|
|
alias bond0 bonding
|
444 |
|
|
options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
|
445 |
|
|
|
446 |
|
|
For just a single target the options would resemble:
|
447 |
|
|
|
448 |
|
|
# example options for ARP monitoring with one target
|
449 |
|
|
alias bond0 bonding
|
450 |
|
|
options bond0 arp_interval=60 arp_ip_target=192.168.0.100
|
451 |
|
|
|
452 |
|
|
Potential Problems When Using ARP Monitor
|
453 |
|
|
=========================================
|
454 |
|
|
|
455 |
|
|
1. Driver support
|
456 |
|
|
|
457 |
|
|
The ARP monitor relies on the network device driver to maintain two
|
458 |
|
|
statistics: the last receive time (dev->last_rx), and the last
|
459 |
|
|
transmit time (dev->trans_start). If the network device driver does
|
460 |
|
|
not update one or both of these, then the typical result will be that,
|
461 |
|
|
upon startup, all links in the bond will immediately be declared down,
|
462 |
|
|
and remain that way. A network monitoring tool (tcpdump, e.g.) will
|
463 |
|
|
show ARP requests and replies being sent and received on the bonding
|
464 |
|
|
device.
|
465 |
|
|
|
466 |
|
|
The possible resolutions for this are to (a) fix the device driver, or
|
467 |
|
|
(b) discontinue the ARP monitor (using miimon as an alternative, for
|
468 |
|
|
example).
|
469 |
|
|
|
470 |
|
|
2. Adventures in Routing
|
471 |
|
|
|
472 |
|
|
When bonding is set up with the ARP monitor, it is important that the
|
473 |
|
|
slave devices not have routes that supercede routes of the master (or,
|
474 |
|
|
generally, not have routes at all). For example, suppose the bonding
|
475 |
|
|
device bond0 has two slaves, eth0 and eth1, and the routing table is
|
476 |
|
|
as follows:
|
477 |
|
|
|
478 |
|
|
Kernel IP routing table
|
479 |
|
|
Destination Gateway Genmask Flags MSS Window irtt Iface
|
480 |
|
|
10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0
|
481 |
|
|
10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1
|
482 |
|
|
10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0
|
483 |
|
|
127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo
|
484 |
|
|
|
485 |
|
|
In this case, the ARP monitor (and ARP itself) may become confused,
|
486 |
|
|
because ARP requests will be sent on one interface (bond0), but the
|
487 |
|
|
corresponding reply will arrive on a different interface (eth0). This
|
488 |
|
|
reply looks to ARP as an unsolicited ARP reply (because ARP matches
|
489 |
|
|
replies on an interface basis), and is discarded. This will likely
|
490 |
|
|
still update the receive/transmit times in the driver, but will lose
|
491 |
|
|
packets.
|
492 |
|
|
|
493 |
|
|
The resolution here is simply to insure that slaves do not have routes
|
494 |
|
|
of their own, and if for some reason they must, those routes do not
|
495 |
|
|
supercede routes of their master. This should generally be the case,
|
496 |
|
|
but unusual configurations or errant manual or automatic static route
|
497 |
|
|
additions may cause trouble.
|
498 |
|
|
|
499 |
|
|
Switch Configuration
|
500 |
|
|
====================
|
501 |
|
|
|
502 |
|
|
While the switch does not need to be configured when the active-backup,
|
503 |
|
|
balance-tlb or balance-alb policies (mode=1,5,6) are used, it does need to
|
504 |
|
|
be configured for the round-robin, XOR, broadcast, or 802.3ad policies
|
505 |
|
|
(mode=0,2,3,4).
|
506 |
|
|
|
507 |
|
|
|
508 |
|
|
Verifying Bond Configuration
|
509 |
|
|
============================
|
510 |
|
|
|
511 |
|
|
1) Bonding information files
|
512 |
|
|
----------------------------
|
513 |
|
|
The bonding driver information files reside in the /proc/net/bonding directory.
|
514 |
|
|
|
515 |
|
|
Sample contents of /proc/net/bonding/bond0 after the driver is loaded with
|
516 |
|
|
parameters of mode=0 and miimon=1000 is shown below.
|
517 |
|
|
|
518 |
|
|
Bonding Mode: load balancing (round-robin)
|
519 |
|
|
Currently Active Slave: eth0
|
520 |
|
|
MII Status: up
|
521 |
|
|
MII Polling Interval (ms): 1000
|
522 |
|
|
Up Delay (ms): 0
|
523 |
|
|
Down Delay (ms): 0
|
524 |
|
|
|
525 |
|
|
Slave Interface: eth1
|
526 |
|
|
MII Status: up
|
527 |
|
|
Link Failure Count: 1
|
528 |
|
|
|
529 |
|
|
Slave Interface: eth0
|
530 |
|
|
MII Status: up
|
531 |
|
|
Link Failure Count: 1
|
532 |
|
|
|
533 |
|
|
2) Network verification
|
534 |
|
|
-----------------------
|
535 |
|
|
The network configuration can be verified using the ifconfig command. In
|
536 |
|
|
the example below, the bond0 interface is the master (MASTER) while eth0 and
|
537 |
|
|
eth1 are slaves (SLAVE). Notice all slaves of bond0 have the same MAC address
|
538 |
|
|
(HWaddr) as bond0 for all modes except TLB and ALB that require a unique MAC
|
539 |
|
|
address for each slave.
|
540 |
|
|
|
541 |
|
|
[root]# /sbin/ifconfig
|
542 |
|
|
bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
|
543 |
|
|
inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
|
544 |
|
|
UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
|
545 |
|
|
RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
|
546 |
|
|
TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
|
547 |
|
|
collisions:0 txqueuelen:0
|
548 |
|
|
|
549 |
|
|
eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
|
550 |
|
|
inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
|
551 |
|
|
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
|
552 |
|
|
RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
|
553 |
|
|
TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
|
554 |
|
|
collisions:0 txqueuelen:100
|
555 |
|
|
Interrupt:10 Base address:0x1080
|
556 |
|
|
|
557 |
|
|
eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
|
558 |
|
|
inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
|
559 |
|
|
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
|
560 |
|
|
RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
|
561 |
|
|
TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
|
562 |
|
|
collisions:0 txqueuelen:100
|
563 |
|
|
Interrupt:9 Base address:0x1400
|
564 |
|
|
|
565 |
|
|
|
566 |
|
|
Frequently Asked Questions
|
567 |
|
|
==========================
|
568 |
|
|
|
569 |
|
|
1. Is it SMP safe?
|
570 |
|
|
|
571 |
|
|
Yes. The old 2.0.xx channel bonding patch was not SMP safe.
|
572 |
|
|
The new driver was designed to be SMP safe from the start.
|
573 |
|
|
|
574 |
|
|
2. What type of cards will work with it?
|
575 |
|
|
|
576 |
|
|
Any Ethernet type cards (you can even mix cards - a Intel
|
577 |
|
|
EtherExpress PRO/100 and a 3com 3c905b, for example).
|
578 |
|
|
You can even bond together Gigabit Ethernet cards!
|
579 |
|
|
|
580 |
|
|
3. How many bonding devices can I have?
|
581 |
|
|
|
582 |
|
|
There is no limit.
|
583 |
|
|
|
584 |
|
|
4. How many slaves can a bonding device have?
|
585 |
|
|
|
586 |
|
|
Limited by the number of network interfaces Linux supports and/or the
|
587 |
|
|
number of network cards you can place in your system.
|
588 |
|
|
|
589 |
|
|
5. What happens when a slave link dies?
|
590 |
|
|
|
591 |
|
|
If your ethernet cards support MII or ETHTOOL link status monitoring
|
592 |
|
|
and the MII monitoring has been enabled in the driver (see description
|
593 |
|
|
of module parameters), there will be no adverse consequences. This
|
594 |
|
|
release of the bonding driver knows how to get the MII information and
|
595 |
|
|
enables or disables its slaves according to their link status.
|
596 |
|
|
See section on High Availability for additional information.
|
597 |
|
|
|
598 |
|
|
For ethernet cards not supporting MII status, the arp_interval and
|
599 |
|
|
arp_ip_target parameters must be specified for bonding to work
|
600 |
|
|
correctly. If packets have not been sent or received during the
|
601 |
|
|
specified arp_interval duration, an ARP request is sent to the
|
602 |
|
|
targets to generate send and receive traffic. If after this
|
603 |
|
|
interval, either the successful send and/or receive count has not
|
604 |
|
|
incremented, the next slave in the sequence will become the active
|
605 |
|
|
slave.
|
606 |
|
|
|
607 |
|
|
If neither mii_monitor and arp_interval is configured, the bonding
|
608 |
|
|
driver will not handle this situation very well. The driver will
|
609 |
|
|
continue to send packets but some packets will be lost. Retransmits
|
610 |
|
|
will cause serious degradation of performance (in the case when one
|
611 |
|
|
of two slave links fails, 50% packets will be lost, which is a serious
|
612 |
|
|
problem for both TCP and UDP).
|
613 |
|
|
|
614 |
|
|
6. Can bonding be used for High Availability?
|
615 |
|
|
|
616 |
|
|
Yes, if you use MII monitoring and ALL your cards support MII link
|
617 |
|
|
status reporting. See section on High Availability for more
|
618 |
|
|
information.
|
619 |
|
|
|
620 |
|
|
7. Which switches/systems does it work with?
|
621 |
|
|
|
622 |
|
|
In round-robin and XOR mode, it works with systems that support
|
623 |
|
|
trunking:
|
624 |
|
|
|
625 |
|
|
* Many Cisco switches and routers (look for EtherChannel support).
|
626 |
|
|
* SunTrunking software.
|
627 |
|
|
* Alteon AceDirector switches / WebOS (use Trunks).
|
628 |
|
|
* BayStack Switches (trunks must be explicitly configured). Stackable
|
629 |
|
|
models (450) can define trunks between ports on different physical
|
630 |
|
|
units.
|
631 |
|
|
* Linux bonding, of course !
|
632 |
|
|
|
633 |
|
|
In 802.3ad mode, it works with with systems that support IEEE 802.3ad
|
634 |
|
|
Dynamic Link Aggregation:
|
635 |
|
|
|
636 |
|
|
* Extreme networks Summit 7i (look for link-aggregation).
|
637 |
|
|
* Many Cisco switches and routers (look for LACP support; this may
|
638 |
|
|
require an upgrade to your IOS software; LACP support was added
|
639 |
|
|
by Cisco in late 2002).
|
640 |
|
|
* Foundry Big Iron 4000
|
641 |
|
|
|
642 |
|
|
In active-backup, balance-tlb and balance-alb modes, it should work
|
643 |
|
|
with any Layer-II switch.
|
644 |
|
|
|
645 |
|
|
|
646 |
|
|
8. Where does a bonding device get its MAC address from?
|
647 |
|
|
|
648 |
|
|
If not explicitly configured with ifconfig, the MAC address of the
|
649 |
|
|
bonding device is taken from its first slave device. This MAC address
|
650 |
|
|
is then passed to all following slaves and remains persistent (even if
|
651 |
|
|
the the first slave is removed) until the bonding device is brought
|
652 |
|
|
down or reconfigured.
|
653 |
|
|
|
654 |
|
|
If you wish to change the MAC address, you can set it with ifconfig:
|
655 |
|
|
|
656 |
|
|
# ifconfig bond0 hw ether 00:11:22:33:44:55
|
657 |
|
|
|
658 |
|
|
The MAC address can be also changed by bringing down/up the device
|
659 |
|
|
and then changing its slaves (or their order):
|
660 |
|
|
|
661 |
|
|
# ifconfig bond0 down ; modprobe -r bonding
|
662 |
|
|
# ifconfig bond0 .... up
|
663 |
|
|
# ifenslave bond0 eth...
|
664 |
|
|
|
665 |
|
|
This method will automatically take the address from the next slave
|
666 |
|
|
that will be added.
|
667 |
|
|
|
668 |
|
|
To restore your slaves' MAC addresses, you need to detach them
|
669 |
|
|
from the bond (`ifenslave -d bond0 eth0'). The bonding driver will then
|
670 |
|
|
restore the MAC addresses that the slaves had before they were enslaved.
|
671 |
|
|
|
672 |
|
|
9. Which transmit polices can be used?
|
673 |
|
|
|
674 |
|
|
Round-robin, based on the order of enslaving, the output device
|
675 |
|
|
is selected base on the next available slave. Regardless of
|
676 |
|
|
the source and/or destination of the packet.
|
677 |
|
|
|
678 |
|
|
Active-backup policy that ensures that one and only one device will
|
679 |
|
|
transmit at any given moment. Active-backup policy is useful for
|
680 |
|
|
implementing high availability solutions using two hubs (see
|
681 |
|
|
section on High Availability).
|
682 |
|
|
|
683 |
|
|
XOR, based on (src hw addr XOR dst hw addr) % slave count. This
|
684 |
|
|
policy selects the same slave for each destination hw address.
|
685 |
|
|
|
686 |
|
|
Broadcast policy transmits everything on all slave interfaces.
|
687 |
|
|
|
688 |
|
|
802.3ad, based on XOR but distributes traffic among all interfaces
|
689 |
|
|
in the active aggregator.
|
690 |
|
|
|
691 |
|
|
Transmit load balancing (balance-tlb) balances the traffic
|
692 |
|
|
according to the current load on each slave. The balancing is
|
693 |
|
|
clients based and the least loaded slave is selected for each new
|
694 |
|
|
client. The load of each slave is calculated relative to its speed
|
695 |
|
|
and enables load balancing in mixed speed teams.
|
696 |
|
|
|
697 |
|
|
Adaptive load balancing (balance-alb) uses the Transmit load
|
698 |
|
|
balancing for the transmit load. The receive load is balanced only
|
699 |
|
|
among the group of highest speed active slaves in the bond. The
|
700 |
|
|
load is distributed with round-robin i.e. next available slave in
|
701 |
|
|
the high speed group of active slaves.
|
702 |
|
|
|
703 |
|
|
High Availability
|
704 |
|
|
=================
|
705 |
|
|
|
706 |
|
|
To implement high availability using the bonding driver, the driver needs to be
|
707 |
|
|
compiled as a module, because currently it is the only way to pass parameters
|
708 |
|
|
to the driver. This may change in the future.
|
709 |
|
|
|
710 |
|
|
High availability is achieved by using MII or ETHTOOL status reporting. You
|
711 |
|
|
need to verify that all your interfaces support MII or ETHTOOL link status
|
712 |
|
|
reporting. On Linux kernel 2.2.17, all the 100 Mbps capable drivers and
|
713 |
|
|
yellowfin gigabit driver support MII. To determine if ETHTOOL link reporting
|
714 |
|
|
is available for interface eth0, type "ethtool eth0" and the "Link detected:"
|
715 |
|
|
line should contain the correct link status. If your system has an interface
|
716 |
|
|
that does not support MII or ETHTOOL status reporting, a failure of its link
|
717 |
|
|
will not be detected! A message indicating MII and ETHTOOL is not supported by
|
718 |
|
|
a network driver is logged when the bonding driver is loaded with a non-zero
|
719 |
|
|
miimon value.
|
720 |
|
|
|
721 |
|
|
The bonding driver can regularly check all its slaves links using the ETHTOOL
|
722 |
|
|
IOCTL (ETHTOOL_GLINK command) or by checking the MII status registers. The
|
723 |
|
|
check interval is specified by the module argument "miimon" (MII monitoring).
|
724 |
|
|
It takes an integer that represents the checking time in milliseconds. It
|
725 |
|
|
should not come to close to (1000/HZ) (10 milli-seconds on i386) because it
|
726 |
|
|
may then reduce the system interactivity. A value of 100 seems to be a good
|
727 |
|
|
starting point. It means that a dead link will be detected at most 100
|
728 |
|
|
milli-seconds after it goes down.
|
729 |
|
|
|
730 |
|
|
Example:
|
731 |
|
|
|
732 |
|
|
# modprobe bonding miimon=100
|
733 |
|
|
|
734 |
|
|
Or, put the following lines in /etc/modules.conf:
|
735 |
|
|
|
736 |
|
|
alias bond0 bonding
|
737 |
|
|
options bond0 miimon=100
|
738 |
|
|
|
739 |
|
|
There are currently two policies for high availability. They are dependent on
|
740 |
|
|
whether:
|
741 |
|
|
|
742 |
|
|
a) hosts are connected to a single host or switch that support trunking
|
743 |
|
|
|
744 |
|
|
b) hosts are connected to several different switches or a single switch that
|
745 |
|
|
does not support trunking
|
746 |
|
|
|
747 |
|
|
|
748 |
|
|
1) High Availability on a single switch or host - load balancing
|
749 |
|
|
----------------------------------------------------------------
|
750 |
|
|
It is the easiest to set up and to understand. Simply configure the
|
751 |
|
|
remote equipment (host or switch) to aggregate traffic over several
|
752 |
|
|
ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces.
|
753 |
|
|
If the module has been loaded with the proper MII option, it will work
|
754 |
|
|
automatically. You can then try to remove and restore different links
|
755 |
|
|
and see in your logs what the driver detects. When testing, you may
|
756 |
|
|
encounter problems on some buggy switches that disable the trunk for a
|
757 |
|
|
long time if all ports in a trunk go down. This is not Linux, but really
|
758 |
|
|
the switch (reboot it to ensure).
|
759 |
|
|
|
760 |
|
|
Example 1 : host to host at twice the speed
|
761 |
|
|
|
762 |
|
|
+----------+ +----------+
|
763 |
|
|
| |eth0 eth0| |
|
764 |
|
|
| Host A +--------------------------+ Host B |
|
765 |
|
|
| +--------------------------+ |
|
766 |
|
|
| |eth1 eth1| |
|
767 |
|
|
+----------+ +----------+
|
768 |
|
|
|
769 |
|
|
On each host :
|
770 |
|
|
# modprobe bonding miimon=100
|
771 |
|
|
# ifconfig bond0 addr
|
772 |
|
|
# ifenslave bond0 eth0 eth1
|
773 |
|
|
|
774 |
|
|
Example 2 : host to switch at twice the speed
|
775 |
|
|
|
776 |
|
|
+----------+ +----------+
|
777 |
|
|
| |eth0 port1| |
|
778 |
|
|
| Host A +--------------------------+ switch |
|
779 |
|
|
| +--------------------------+ |
|
780 |
|
|
| |eth1 port2| |
|
781 |
|
|
+----------+ +----------+
|
782 |
|
|
|
783 |
|
|
On host A : On the switch :
|
784 |
|
|
# modprobe bonding miimon=100 # set up a trunk on port1
|
785 |
|
|
# ifconfig bond0 addr and port2
|
786 |
|
|
# ifenslave bond0 eth0 eth1
|
787 |
|
|
|
788 |
|
|
|
789 |
|
|
2) High Availability on two or more switches (or a single switch without
|
790 |
|
|
trunking support)
|
791 |
|
|
---------------------------------------------------------------------------
|
792 |
|
|
This mode is more problematic because it relies on the fact that there
|
793 |
|
|
are multiple ports and the host's MAC address should be visible on one
|
794 |
|
|
port only to avoid confusing the switches.
|
795 |
|
|
|
796 |
|
|
If you need to know which interface is the active one, and which ones are
|
797 |
|
|
backup, use ifconfig. All backup interfaces have the NOARP flag set.
|
798 |
|
|
|
799 |
|
|
To use this mode, pass "mode=1" to the module at load time :
|
800 |
|
|
|
801 |
|
|
# modprobe bonding miimon=100 mode=active-backup
|
802 |
|
|
|
803 |
|
|
or:
|
804 |
|
|
|
805 |
|
|
# modprobe bonding miimon=100 mode=1
|
806 |
|
|
|
807 |
|
|
Or, put in your /etc/modules.conf :
|
808 |
|
|
|
809 |
|
|
alias bond0 bonding
|
810 |
|
|
options bond0 miimon=100 mode=active-backup
|
811 |
|
|
|
812 |
|
|
Example 1: Using multiple host and multiple switches to build a "no single
|
813 |
|
|
point of failure" solution.
|
814 |
|
|
|
815 |
|
|
|
816 |
|
|
| |
|
817 |
|
|
|port3 port3|
|
818 |
|
|
+-----+----+ +-----+----+
|
819 |
|
|
| |port7 ISL port7| |
|
820 |
|
|
| switch A +--------------------------+ switch B |
|
821 |
|
|
| +--------------------------+ |
|
822 |
|
|
| |port8 port8| |
|
823 |
|
|
+----++----+ +-----++---+
|
824 |
|
|
port2||port1 port1||port2
|
825 |
|
|
|| +-------+ ||
|
826 |
|
|
|+-------------+ host1 +---------------+|
|
827 |
|
|
| eth0 +-------+ eth1 |
|
828 |
|
|
| |
|
829 |
|
|
| +-------+ |
|
830 |
|
|
+--------------+ host2 +----------------+
|
831 |
|
|
eth0 +-------+ eth1
|
832 |
|
|
|
833 |
|
|
In this configuration, there is an ISL - Inter Switch Link (could be a trunk),
|
834 |
|
|
several servers (host1, host2 ...) attached to both switches each, and one or
|
835 |
|
|
more ports to the outside world (port3...). One and only one slave on each host
|
836 |
|
|
is active at a time, while all links are still monitored (the system can
|
837 |
|
|
detect a failure of active and backup links).
|
838 |
|
|
|
839 |
|
|
Each time a host changes its active interface, it sticks to the new one until
|
840 |
|
|
it goes down. In this example, the hosts are negligibly affected by the
|
841 |
|
|
expiration time of the switches' forwarding tables.
|
842 |
|
|
|
843 |
|
|
If host1 and host2 have the same functionality and are used in load balancing
|
844 |
|
|
by another external mechanism, it is good to have host1's active interface
|
845 |
|
|
connected to one switch and host2's to the other. Such system will survive
|
846 |
|
|
a failure of a single host, cable, or switch. The worst thing that may happen
|
847 |
|
|
in the case of a switch failure is that half of the hosts will be temporarily
|
848 |
|
|
unreachable until the other switch expires its tables.
|
849 |
|
|
|
850 |
|
|
Example 2: Using multiple ethernet cards connected to a switch to configure
|
851 |
|
|
NIC failover (switch is not required to support trunking).
|
852 |
|
|
|
853 |
|
|
|
854 |
|
|
+----------+ +----------+
|
855 |
|
|
| |eth0 port1| |
|
856 |
|
|
| Host A +--------------------------+ switch |
|
857 |
|
|
| +--------------------------+ |
|
858 |
|
|
| |eth1 port2| |
|
859 |
|
|
+----------+ +----------+
|
860 |
|
|
|
861 |
|
|
On host A : On the switch :
|
862 |
|
|
# modprobe bonding miimon=100 mode=1 # (optional) minimize the time
|
863 |
|
|
# ifconfig bond0 addr # for table expiration
|
864 |
|
|
# ifenslave bond0 eth0 eth1
|
865 |
|
|
|
866 |
|
|
Each time the host changes its active interface, it sticks to the new one until
|
867 |
|
|
it goes down. In this example, the host is strongly affected by the expiration
|
868 |
|
|
time of the switch forwarding table.
|
869 |
|
|
|
870 |
|
|
|
871 |
|
|
3) Adapting to your switches' timing
|
872 |
|
|
------------------------------------
|
873 |
|
|
If your switches take a long time to go into backup mode, it may be
|
874 |
|
|
desirable not to activate a backup interface immediately after a link goes
|
875 |
|
|
down. It is possible to delay the moment at which a link will be
|
876 |
|
|
completely disabled by passing the module parameter "downdelay" (in
|
877 |
|
|
milliseconds, must be a multiple of miimon).
|
878 |
|
|
|
879 |
|
|
When a switch reboots, it is possible that its ports report "link up" status
|
880 |
|
|
before they become usable. This could fool a bond device by causing it to
|
881 |
|
|
use some ports that are not ready yet. It is possible to delay the moment at
|
882 |
|
|
which an active link will be reused by passing the module parameter "updelay"
|
883 |
|
|
(in milliseconds, must be a multiple of miimon).
|
884 |
|
|
|
885 |
|
|
A similar situation can occur when a host re-negotiates a lost link with the
|
886 |
|
|
switch (a case of cable replacement).
|
887 |
|
|
|
888 |
|
|
A special case is when a bonding interface has lost all slave links. Then the
|
889 |
|
|
driver will immediately reuse the first link that goes up, even if updelay
|
890 |
|
|
parameter was specified. (If there are slave interfaces in the "updelay" state,
|
891 |
|
|
the interface that first went into that state will be immediately reused.) This
|
892 |
|
|
allows to reduce down-time if the value of updelay has been overestimated.
|
893 |
|
|
|
894 |
|
|
Examples :
|
895 |
|
|
|
896 |
|
|
# modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000
|
897 |
|
|
# modprobe bonding miimon=100 mode=balance-rr downdelay=0 updelay=5000
|
898 |
|
|
|
899 |
|
|
|
900 |
|
|
Promiscuous Sniffing notes
|
901 |
|
|
==========================
|
902 |
|
|
|
903 |
|
|
If you wish to bond channels together for a network sniffing
|
904 |
|
|
application --- you wish to run tcpdump, or ethereal, or an IDS like
|
905 |
|
|
snort, with its input aggregated from multiple interfaces using the
|
906 |
|
|
bonding driver --- then you need to handle the Promiscuous interface
|
907 |
|
|
setting by hand. Specifically, when you "ifconfing bond0 up" you
|
908 |
|
|
must add the promisc flag there; it will be propagated down to the
|
909 |
|
|
slave interfaces at ifenslave time; a full example might look like:
|
910 |
|
|
|
911 |
|
|
grep bond0 /etc/modules.conf || echo alias bond0 bonding >/etc/modules.conf
|
912 |
|
|
ifconfig bond0 promisc up
|
913 |
|
|
for if in eth1 eth2 ...;do
|
914 |
|
|
ifconfig $if up
|
915 |
|
|
ifenslave bond0 $if
|
916 |
|
|
done
|
917 |
|
|
snort ... -i bond0 ...
|
918 |
|
|
|
919 |
|
|
Ifenslave also wants to propagate addresses from interface to
|
920 |
|
|
interface, appropriately for its design functions in HA and channel
|
921 |
|
|
capacity aggregating; but it works fine for unnumbered interfaces;
|
922 |
|
|
just ignore all the warnings it emits.
|
923 |
|
|
|
924 |
|
|
|
925 |
|
|
8021q VLAN support
|
926 |
|
|
==================
|
927 |
|
|
|
928 |
|
|
It is possible to configure VLAN devices over a bond interface using the 8021q
|
929 |
|
|
driver. However, only packets coming from the 8021q driver and passing through
|
930 |
|
|
bonding will be tagged by default. Self generated packets, like bonding's
|
931 |
|
|
learning packets or ARP packets generated by either ALB mode or the ARP
|
932 |
|
|
monitor mechanism, are tagged internally by bonding itself. As a result,
|
933 |
|
|
bonding has to "learn" what VLAN IDs are configured on top of it, and it uses
|
934 |
|
|
those IDs to tag self generated packets.
|
935 |
|
|
|
936 |
|
|
For simplicity reasons, and to support the use of adapters that can do VLAN
|
937 |
|
|
hardware acceleration offloding, the bonding interface declares itself as
|
938 |
|
|
fully hardware offloaing capable, it gets the add_vid/kill_vid notifications
|
939 |
|
|
to gather the necessary information, and it propagates those actions to the
|
940 |
|
|
slaves.
|
941 |
|
|
In case of mixed adapter types, hardware accelerated tagged packets that should
|
942 |
|
|
go through an adapter that is not offloading capable are "un-accelerated" by the
|
943 |
|
|
bonding driver so the VLAN tag sits in the regular location.
|
944 |
|
|
|
945 |
|
|
VLAN interfaces *must* be added on top of a bonding interface only after
|
946 |
|
|
enslaving at least one slave. This is because until the first slave is added the
|
947 |
|
|
bonding interface has a HW address of 00:00:00:00:00:00, which will be copied by
|
948 |
|
|
the VLAN interface when it is created.
|
949 |
|
|
|
950 |
|
|
Notice that a problem would occur if all slaves are released from a bond that
|
951 |
|
|
still has VLAN interfaces on top of it. When later coming to add new slaves, the
|
952 |
|
|
bonding interface would get a HW address from the first slave, which might not
|
953 |
|
|
match that of the VLAN interfaces. It is recommended that either all VLANs are
|
954 |
|
|
removed and then re-added, or to manually set the bonding interface's HW
|
955 |
|
|
address so it matches the VLAN's. (Note: changing a VLAN interface's HW address
|
956 |
|
|
would set the underlying device -- i.e. the bonding interface -- to promiscouos
|
957 |
|
|
mode, which might not be what you want).
|
958 |
|
|
|
959 |
|
|
|
960 |
|
|
Limitations
|
961 |
|
|
===========
|
962 |
|
|
The main limitations are :
|
963 |
|
|
- only the link status is monitored. If the switch on the other side is
|
964 |
|
|
partially down (e.g. doesn't forward anymore, but the link is OK), the link
|
965 |
|
|
won't be disabled. Another way to check for a dead link could be to count
|
966 |
|
|
incoming frames on a heavily loaded host. This is not applicable to small
|
967 |
|
|
servers, but may be useful when the front switches send multicast
|
968 |
|
|
information on their links (e.g. VRRP), or even health-check the servers.
|
969 |
|
|
Use the arp_interval/arp_ip_target parameters to count incoming/outgoing
|
970 |
|
|
frames.
|
971 |
|
|
|
972 |
|
|
|
973 |
|
|
|
974 |
|
|
Resources and Links
|
975 |
|
|
===================
|
976 |
|
|
|
977 |
|
|
Current development on this driver is posted to:
|
978 |
|
|
- http://www.sourceforge.net/projects/bonding/
|
979 |
|
|
|
980 |
|
|
Donald Becker's Ethernet Drivers and diag programs may be found at :
|
981 |
|
|
- http://www.scyld.com/network/
|
982 |
|
|
|
983 |
|
|
You will also find a lot of information regarding Ethernet, NWay, MII, etc. at
|
984 |
|
|
www.scyld.com.
|
985 |
|
|
|
986 |
|
|
Patches for 2.2 kernels are at Willy Tarreau's site :
|
987 |
|
|
- http://wtarreau.free.fr/pub/bonding/
|
988 |
|
|
- http://www-miaif.lip6.fr/~tarreau/pub/bonding/
|
989 |
|
|
|
990 |
|
|
To get latest informations about Linux Kernel development, please consult
|
991 |
|
|
the Linux Kernel Mailing List Archives at :
|
992 |
|
|
http://www.ussg.iu.edu/hypermail/linux/kernel/
|
993 |
|
|
|
994 |
|
|
-- END --
|