OpenCores
URL https://opencores.org/ocsvn/or1k/or1k/trunk

Subversion Repositories or1k

[/] [or1k/] [trunk/] [linux/] [linux-2.4/] [Documentation/] [networking/] [bonding.txt] - Blame information for rev 1765

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 1275 phoenix
 
2
                   Linux Ethernet Bonding Driver mini-howto
3
 
4
Initial release : Thomas Davis 
5
Corrections, HA extensions : 2000/10/03-15 :
6
  - Willy Tarreau 
7
  - Constantine Gavrilov 
8
  - Chad N. Tindel 
9
  - Janice Girouard 
10
  - Jay Vosburgh 
11
 
12
Note :
13
------
14
The bonding driver originally came from Donald Becker's beowulf patches for
15
kernel 2.0. It has changed quite a bit since, and the original tools from
16
extreme-linux and beowulf sites will not work with this version of the driver.
17
 
18
For new versions of the driver, patches for older kernels and the updated
19
userspace tools, please follow the links at the end of this file.
20
 
21
 
22
Table of Contents
23
=================
24
 
25
Installation
26
Bond Configuration
27
Module Parameters
28
Configuring Multiple Bonds
29
Switch Configuration
30
Verifying Bond Configuration
31
Frequently Asked Questions
32
High Availability
33
Promiscuous Sniffing notes
34
8021q VLAN support
35
Limitations
36
Resources and Links
37
 
38
 
39
Installation
40
============
41
 
42
1) Build kernel with the bonding driver
43
---------------------------------------
44
For the latest version of the bonding driver, use kernel 2.4.12 or above
45
(otherwise you will need to apply a patch).
46
 
47
Configure kernel with `make menuconfig/xconfig/config', and select "Bonding
48
driver support" in the "Network device support" section. It is recommended
49
to configure the driver as module since it is currently the only way to
50
pass parameters to the driver and configure more than one bonding device.
51
 
52
Build and install the new kernel and modules.
53
 
54
2) Get and install the userspace tools
55
--------------------------------------
56
This version of the bonding driver requires updated ifenslave program. The
57
original one from extreme-linux and beowulf will not work. Kernels 2.4.12
58
and above include the updated version of ifenslave.c in Documentation/network
59
directory. For older kernels, please follow the links at the end of this file.
60
 
61
IMPORTANT!!!  If you are running on Redhat 7.1 or greater, you need
62
to be careful because /usr/include/linux is no longer a symbolic link
63
to /usr/src/linux/include/linux.  If you build ifenslave while this is
64
true, ifenslave will appear to succeed but your bond won't work.  The purpose
65
of the -I option on the ifenslave compile line is to make sure it uses
66
/usr/src/linux/include/linux/if_bonding.h instead of the version from
67
/usr/include/linux.
68
 
69
To install ifenslave.c, do:
70
    # gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave
71
    # cp ifenslave /sbin/ifenslave
72
 
73
 
74
Bond Configuration
75
==================
76
 
77
You will need to add at least the following line to /etc/modules.conf
78
so the bonding driver will automatically load when the bond0 interface is
79
configured. Refer to the modules.conf manual page for specific modules.conf
80
syntax details. The Module Parameters section of this document describes each
81
bonding driver parameter.
82
 
83
        alias bond0 bonding
84
 
85
Use standard distribution techniques to define the bond0 network interface. For
86
example, on modern Red Hat distributions, create an ifcfg-bond0 file in
87
the /etc/sysconfig/network-scripts directory that resembles the following:
88
 
89
DEVICE=bond0
90
IPADDR=192.168.1.1
91
NETMASK=255.255.255.0
92
NETWORK=192.168.1.0
93
BROADCAST=192.168.1.255
94
ONBOOT=yes
95
BOOTPROTO=none
96
USERCTL=no
97
 
98
(use appropriate values for your network above)
99
 
100
All interfaces that are part of a bond should have SLAVE and MASTER
101
definitions. For example, in the case of Red Hat, if you wish to make eth0 and
102
eth1 a part of the bonding interface bond0, their config files (ifcfg-eth0 and
103
ifcfg-eth1) should resemble the following:
104
 
105
DEVICE=eth0
106
USERCTL=no
107
ONBOOT=yes
108
MASTER=bond0
109
SLAVE=yes
110
BOOTPROTO=none
111
 
112
Use DEVICE=eth1 in the ifcfg-eth1 config file. If you configure a second
113
bonding interface (bond1), use MASTER=bond1 in the config file to make the
114
network interface be a slave of bond1.
115
 
116
Restart the networking subsystem or just bring up the bonding device if your
117
administration tools allow it. Otherwise, reboot. On Red Hat distros you can
118
issue `ifup bond0' or `/etc/rc.d/init.d/network restart'.
119
 
120
If the administration tools of your distribution do not support
121
master/slave notation in configuring network interfaces, you will need to
122
manually configure the bonding device with the following commands:
123
 
124
    # /sbin/ifconfig bond0 192.168.1.1 netmask 255.255.255.0 \
125
      broadcast 192.168.1.255 up
126
 
127
    # /sbin/ifenslave bond0 eth0
128
    # /sbin/ifenslave bond0 eth1
129
 
130
(use appropriate values for your network above)
131
 
132
You can then create a script containing these commands and place it in the
133
appropriate rc directory.
134
 
135
If you specifically need all network drivers loaded before the bonding driver,
136
adding the following line to modules.conf will cause the network driver for
137
eth0 and eth1 to be loaded before the bonding driver.
138
 
139
probeall bond0 eth0 eth1 bonding
140
 
141
Be careful not to reference bond0 itself at the end of the line, or modprobe
142
will die in an endless recursive loop.
143
 
144
If running SNMP agents, the bonding driver should be loaded before any network
145
drivers participating in a bond. This requirement is due to the the interface
146
index (ipAdEntIfIndex) being associated to the first interface found with a
147
given IP address. That is, there is only one ipAdEntIfIndex for each IP
148
address. For example, if eth0 and eth1 are slaves of bond0 and the driver for
149
eth0 is loaded before the bonding driver, the interface for the IP address
150
will be associated with the eth0 interface. This configuration is shown below,
151
the IP address 192.168.1.1 has an interface index of 2 which indexes to eth0
152
in the ifDescr table (ifDescr.2).
153
 
154
     interfaces.ifTable.ifEntry.ifDescr.1 = lo
155
     interfaces.ifTable.ifEntry.ifDescr.2 = eth0
156
     interfaces.ifTable.ifEntry.ifDescr.3 = eth1
157
     interfaces.ifTable.ifEntry.ifDescr.4 = eth2
158
     interfaces.ifTable.ifEntry.ifDescr.5 = eth3
159
     interfaces.ifTable.ifEntry.ifDescr.6 = bond0
160
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
161
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
162
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
163
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
164
 
165
This problem is avoided by loading the bonding driver before any network
166
drivers participating in a bond. Below is an example of loading the bonding
167
driver first, the IP address 192.168.1.1 is correctly associated with
168
ifDescr.2.
169
 
170
     interfaces.ifTable.ifEntry.ifDescr.1 = lo
171
     interfaces.ifTable.ifEntry.ifDescr.2 = bond0
172
     interfaces.ifTable.ifEntry.ifDescr.3 = eth0
173
     interfaces.ifTable.ifEntry.ifDescr.4 = eth1
174
     interfaces.ifTable.ifEntry.ifDescr.5 = eth2
175
     interfaces.ifTable.ifEntry.ifDescr.6 = eth3
176
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
177
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
178
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
179
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
180
 
181
While some distributions may not report the interface name in ifDescr,
182
the association between the IP address and IfIndex remains and SNMP
183
functions such as Interface_Scan_Next will report that association.
184
 
185
 
186
Module Parameters
187
=================
188
 
189
Optional parameters for the bonding driver can be supplied as command line
190
arguments to the insmod command. Typically, these parameters are specified in
191
the file /etc/modules.conf (see the manual page for modules.conf). The
192
available bonding driver parameters are listed below. If a parameter is not
193
specified the default value is used. When initially configuring a bond, it
194
is recommended "tail -f /var/log/messages" be run in a separate window to
195
watch for bonding driver error messages.
196
 
197
It is critical that either the miimon or arp_interval and arp_ip_target
198
parameters be specified, otherwise serious network degradation will occur
199
during link failures.
200
 
201
arp_interval
202
 
203
        Specifies the ARP monitoring frequency in milli-seconds.
204
        If ARP monitoring is used in a load-balancing mode (mode 0 or 2), the
205
        switch should be configured in a mode that evenly distributes packets
206
        across all links - such as round-robin. If the switch is configured to
207
        distribute the packets in an XOR fashion, all replies from the ARP
208
        targets will be received on the same link which could cause the other
209
        team members to fail. ARP monitoring should not be used in conjunction
210
        with miimon. A value of 0 disables ARP monitoring. The default value
211
        is 0.
212
 
213
arp_ip_target
214
 
215
        Specifies the ip addresses to use when arp_interval is > 0. These
216
        are the targets of the ARP request sent to determine the health of
217
        the link to the targets. Specify these values in ddd.ddd.ddd.ddd
218
        format. Multiple ip adresses must be seperated by a comma. At least
219
        one ip address needs to be given for ARP monitoring to work. The
220
        maximum number of targets that can be specified is set at 16.
221
 
222
downdelay
223
 
224
        Specifies the delay time in milli-seconds to disable a link after a
225
        link failure has been detected. This should be a multiple of miimon
226
        value, otherwise the value will be rounded. The default value is 0.
227
 
228
lacp_rate
229
 
230
        Option specifying the rate in which we'll ask our link partner to
231
        transmit LACPDU packets in 802.3ad mode.  Possible values are:
232
 
233
        slow or 0
234
                Request partner to transmit LACPDUs every 30 seconds (default)
235
 
236
        fast or 1
237
                Request partner to transmit LACPDUs every 1 second
238
 
239
max_bonds
240
 
241
        Specifies the number of bonding devices to create for this
242
        instance of the bonding driver.  E.g., if max_bonds is 3, and
243
        the bonding driver is not already loaded, then bond0, bond1
244
        and bond2 will be created.  The default value is 1.
245
 
246
miimon
247
 
248
        Specifies the frequency in milli-seconds that MII link monitoring
249
        will occur. A value of zero disables MII link monitoring. A value
250
        of 100 is a good starting point. See High Availability section for
251
        additional information. The default value is 0.
252
 
253
mode
254
 
255
        Specifies one of the bonding policies. The default is
256
        round-robin (balance-rr).  Possible values are (you can use
257
        either the text or numeric option):
258
 
259
        balance-rr or 0
260
 
261
                Round-robin policy: Transmit in a sequential order
262
                from the first available slave through the last. This
263
                mode provides load balancing and fault tolerance.
264
 
265
        active-backup or 1
266
 
267
                Active-backup policy: Only one slave in the bond is
268
                active. A different slave becomes active if, and only
269
                if, the active slave fails. The bond's MAC address is
270
                externally visible on only one port (network adapter)
271
                to avoid confusing the switch.  This mode provides
272
                fault tolerance.
273
 
274
        balance-xor or 2
275
 
276
                XOR policy: Transmit based on [(source MAC address
277
                XOR'd with destination MAC address) modula slave
278
                count]. This selects the same slave for each
279
                destination MAC address. This mode provides load
280
                balancing and fault tolerance.
281
 
282
        broadcast or 3
283
 
284
                Broadcast policy: transmits everything on all slave
285
                interfaces. This mode provides fault tolerance.
286
 
287
        802.3ad or 4
288
 
289
                IEEE 802.3ad Dynamic link aggregation. Creates aggregation
290
                groups that share the same speed and duplex settings.
291
                Transmits and receives on all slaves in the active
292
                aggregator.
293
 
294
                Pre-requisites:
295
 
296
                1. Ethtool support in the base drivers for retrieving the
297
                speed and duplex of each slave.
298
 
299
                2. A switch that supports IEEE 802.3ad Dynamic link
300
                aggregation.
301
 
302
        balance-tlb or 5
303
 
304
                Adaptive transmit load balancing: channel bonding that does
305
                not require any special switch support. The outgoing
306
                traffic is distributed according to the current load
307
                (computed relative to the speed) on each slave. Incoming
308
                traffic is received by the current slave. If the receiving
309
                slave fails, another slave takes over the MAC address of
310
                the failed receiving slave.
311
 
312
                Prerequisite:
313
 
314
                Ethtool support in the base drivers for retrieving the
315
                speed of each slave.
316
 
317
        balance-alb or 6
318
 
319
                Adaptive load balancing: includes balance-tlb + receive
320
                load balancing (rlb) for IPV4 traffic and does not require
321
                any special switch support. The receive load balancing is
322
                achieved by ARP negotiation. The bonding driver intercepts
323
                the ARP Replies sent by the server on their way out and
324
                overwrites the src hw address with the unique hw address of
325
                one of the slaves in the bond such that different clients
326
                use different hw addresses for the server.
327
 
328
                Receive traffic from connections created by the server is
329
                also balanced. When the server sends an ARP Request the
330
                bonding driver copies and saves the client's IP information
331
                from the ARP. When the ARP Reply arrives from the client,
332
                its hw address is retrieved and the bonding driver
333
                initiates an ARP reply to this client assigning it to one
334
                of the slaves in the bond. A problematic outcome of using
335
                ARP negotiation for balancing is that each time that an ARP
336
                request is broadcasted it uses the hw address of the
337
                bond. Hence, clients learn the hw address of the bond and
338
                the balancing of receive traffic collapses to the current
339
                salve. This is handled by sending updates (ARP Replies) to
340
                all the clients with their assigned hw address such that
341
                the traffic is redistributed. Receive traffic is also
342
                redistributed when a new slave is added to the bond and
343
                when an inactive slave is re-activated. The receive load is
344
                distributed sequentially (round robin) among the group of
345
                highest speed slaves in the bond.
346
 
347
                When a link is reconnected or a new slave joins the bond
348
                the receive traffic is redistributed among all active
349
                slaves in the bond by intiating ARP Replies with the
350
                selected mac address to each of the clients. The updelay
351
                modeprobe parameter must be set to a value equal or greater
352
                than the switch's forwarding delay so that the ARP Replies
353
                sent to the clients will not be blocked by the switch.
354
 
355
                Prerequisites:
356
 
357
                1. Ethtool support in the base drivers for retrieving the
358
                speed of each slave.
359
 
360
                2. Base driver support for setting the hw address of a
361
                device also when it is open. This is required so that there
362
                will always be one slave in the team using the bond hw
363
                address (the curr_active_slave) while having a unique hw
364
                address for each slave in the bond. If the curr_active_slave
365
                fails it's hw address is swapped with the new curr_active_slave
366
                that was chosen.
367
 
368
primary
369
 
370
        A string (eth0, eth2, etc) to equate to a primary device. If this
371
        value is entered, and the device is on-line, it will be used first
372
        as the output media. Only when this device is off-line, will
373
        alternate devices be used. Otherwise, once a failover is detected
374
        and a new default output is chosen, it will remain the output media
375
        until it too fails. This is useful when one slave was preferred
376
        over another, i.e. when one slave is 1000Mbps and another is
377
        100Mbps. If the 1000Mbps slave fails and is later restored, it may
378
        be preferred the faster slave gracefully become the active slave -
379
        without deliberately failing the 100Mbps slave. Specifying a
380
        primary is only valid in active-backup mode.
381
 
382
updelay
383
 
384
        Specifies the delay time in milli-seconds to enable a link after a
385
        link up status has been detected. This should be a multiple of miimon
386
        value, otherwise the value will be rounded. The default value is 0.
387
 
388
use_carrier
389
 
390
        Specifies whether or not miimon should use MII or ETHTOOL
391
        ioctls vs. netif_carrier_ok() to determine the link status.
392
        The MII or ETHTOOL ioctls are less efficient and utilize a
393
        deprecated calling sequence within the kernel.  The
394
        netif_carrier_ok() relies on the device driver to maintain its
395
        state with netif_carrier_on/off; at this writing, most, but
396
        not all, device drivers support this facility.
397
 
398
        If bonding insists that the link is up when it should not be,
399
        it may be that your network device driver does not support
400
        netif_carrier_on/off.  This is because the default state for
401
        netif_carrier is "carrier on." In this case, disabling
402
        use_carrier will cause bonding to revert to the MII / ETHTOOL
403
        ioctl method to determine the link state.
404
 
405
        A value of 1 enables the use of netif_carrier_ok(), a value of
406
 
407
        value is 1.
408
 
409
 
410
Configuring Multiple Bonds
411
==========================
412
 
413
If several bonding interfaces are required, either specify the max_bonds
414
parameter (described above), or load the driver multiple times.  Using
415
the max_bonds parameter is less complicated, but has the limitation that
416
all bonding instances created will have the same options.  Loading the
417
driver multiple times allows each instance of the driver to have differing
418
options.
419
 
420
For example, to configure two bonding interfaces, one with mii link
421
monitoring performed every 100 milliseconds, and one with ARP link
422
monitoring performed every 200 milliseconds, the /etc/conf.modules should
423
resemble the following:
424
 
425
alias bond0 bonding
426
alias bond1 bonding
427
 
428
options bond0 miimon=100
429
options bond1 -o bonding1 arp_interval=200 arp_ip_target=10.0.0.1
430
 
431
Configuring Multiple ARP Targets
432
================================
433
 
434
While ARP monitoring can be done with just one target, it can be useful
435
in a High Availability setup to have several targets to monitor. In the
436
case of just one target,  the target itself may go down or have a problem
437
making it unresponsive to ARP requests. Having an additional target (or
438
several) increases the reliability of the ARP monitoring.
439
 
440
Multiple ARP targets must be seperated by commas as follows:
441
 
442
# example options for ARP monitoring with three targets
443
alias bond0 bonding
444
options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
445
 
446
For just a single target the options would resemble:
447
 
448
# example options for ARP monitoring with one target
449
alias bond0 bonding
450
options bond0 arp_interval=60 arp_ip_target=192.168.0.100
451
 
452
Potential Problems When Using ARP Monitor
453
=========================================
454
 
455
1. Driver support
456
 
457
The ARP monitor relies on the network device driver to maintain two
458
statistics: the last receive time (dev->last_rx), and the last
459
transmit time (dev->trans_start).  If the network device driver does
460
not update one or both of these, then the typical result will be that,
461
upon startup, all links in the bond will immediately be declared down,
462
and remain that way.  A network monitoring tool (tcpdump, e.g.) will
463
show ARP requests and replies being sent and received on the bonding
464
device.
465
 
466
The possible resolutions for this are to (a) fix the device driver, or
467
(b) discontinue the ARP monitor (using miimon as an alternative, for
468
example).
469
 
470
2. Adventures in Routing
471
 
472
When bonding is set up with the ARP monitor, it is important that the
473
slave devices not have routes that supercede routes of the master (or,
474
generally, not have routes at all).  For example, suppose the bonding
475
device bond0 has two slaves, eth0 and eth1, and the routing table is
476
as follows:
477
 
478
Kernel IP routing table
479
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
480
10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth0
481
10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth1
482
10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 bond0
483
127.0.0.0       0.0.0.0         255.0.0.0       U        40 0          0 lo
484
 
485
In this case, the ARP monitor (and ARP itself) may become confused,
486
because ARP requests will be sent on one interface (bond0), but the
487
corresponding reply will arrive on a different interface (eth0).  This
488
reply looks to ARP as an unsolicited ARP reply (because ARP matches
489
replies on an interface basis), and is discarded.  This will likely
490
still update the receive/transmit times in the driver, but will lose
491
packets.
492
 
493
The resolution here is simply to insure that slaves do not have routes
494
of their own, and if for some reason they must, those routes do not
495
supercede routes of their master.  This should generally be the case,
496
but unusual configurations or errant manual or automatic static route
497
additions may cause trouble.
498
 
499
Switch Configuration
500
====================
501
 
502
While the switch does not need to be configured when the active-backup,
503
balance-tlb or balance-alb policies (mode=1,5,6) are used, it does need to
504
be configured for the round-robin, XOR, broadcast, or 802.3ad policies
505
(mode=0,2,3,4).
506
 
507
 
508
Verifying Bond Configuration
509
============================
510
 
511
1) Bonding information files
512
----------------------------
513
The bonding driver information files reside in the /proc/net/bonding directory.
514
 
515
Sample contents of /proc/net/bonding/bond0 after the driver is loaded with
516
parameters of mode=0 and miimon=1000 is shown below.
517
 
518
        Bonding Mode: load balancing (round-robin)
519
        Currently Active Slave: eth0
520
        MII Status: up
521
        MII Polling Interval (ms): 1000
522
        Up Delay (ms): 0
523
        Down Delay (ms): 0
524
 
525
        Slave Interface: eth1
526
        MII Status: up
527
        Link Failure Count: 1
528
 
529
        Slave Interface: eth0
530
        MII Status: up
531
        Link Failure Count: 1
532
 
533
2) Network verification
534
-----------------------
535
The network configuration can be verified using the ifconfig command. In
536
the example below, the bond0 interface is the master (MASTER) while eth0 and
537
eth1 are slaves (SLAVE). Notice all slaves of bond0 have the same MAC address
538
(HWaddr) as bond0 for all modes except TLB and ALB that require a unique MAC
539
address for each slave.
540
 
541
[root]# /sbin/ifconfig
542
bond0     Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
543
          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
544
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
545
          RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
546
          TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
547
          collisions:0 txqueuelen:0
548
 
549
eth0      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
550
          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
551
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
552
          RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
553
          TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
554
          collisions:0 txqueuelen:100
555
          Interrupt:10 Base address:0x1080
556
 
557
eth1      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
558
          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
559
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
560
          RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
561
          TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
562
          collisions:0 txqueuelen:100
563
          Interrupt:9 Base address:0x1400
564
 
565
 
566
Frequently Asked Questions
567
==========================
568
 
569
1.  Is it SMP safe?
570
 
571
        Yes. The old 2.0.xx channel bonding patch was not SMP safe.
572
        The new driver was designed to be SMP safe from the start.
573
 
574
2.  What type of cards will work with it?
575
 
576
        Any Ethernet type cards (you can even mix cards - a Intel
577
        EtherExpress PRO/100 and a 3com 3c905b, for example).
578
        You can even bond together Gigabit Ethernet cards!
579
 
580
3.  How many bonding devices can I have?
581
 
582
        There is no limit.
583
 
584
4.  How many slaves can a bonding device have?
585
 
586
        Limited by the number of network interfaces Linux supports and/or the
587
        number of network cards you can place in your system.
588
 
589
5.  What happens when a slave link dies?
590
 
591
        If your ethernet cards support MII or ETHTOOL link status monitoring
592
        and the MII monitoring has been enabled in the driver (see description
593
        of module parameters), there will be no adverse consequences. This
594
        release of the bonding driver knows how to get the MII information and
595
        enables or disables its slaves according to their link status.
596
        See section on High Availability for additional information.
597
 
598
        For ethernet cards not supporting MII status, the arp_interval and
599
        arp_ip_target parameters must be specified for bonding to work
600
        correctly. If packets have not been sent or received during the
601
        specified arp_interval duration, an ARP request is sent to the
602
        targets to generate send and receive traffic. If after this
603
        interval, either the successful send and/or receive count has not
604
        incremented, the next slave in the sequence will become the active
605
        slave.
606
 
607
        If neither mii_monitor and arp_interval is configured, the bonding
608
        driver will not handle this situation very well. The driver will
609
        continue to send packets but some packets will be lost. Retransmits
610
        will cause serious degradation of performance (in the case when one
611
        of two slave links fails, 50% packets will be lost, which is a serious
612
        problem for both TCP and UDP).
613
 
614
6.  Can bonding be used for High Availability?
615
 
616
        Yes, if you use MII monitoring and ALL your cards support MII link
617
        status reporting. See section on High Availability for more
618
        information.
619
 
620
7.  Which switches/systems does it work with?
621
 
622
        In round-robin and XOR mode, it works with systems that support
623
        trunking:
624
 
625
        * Many Cisco switches and routers (look for EtherChannel support).
626
        * SunTrunking software.
627
        * Alteon AceDirector switches / WebOS (use Trunks).
628
        * BayStack Switches (trunks must be explicitly configured). Stackable
629
          models (450) can define trunks between ports on different physical
630
          units.
631
        * Linux bonding, of course !
632
 
633
        In 802.3ad mode, it works with with systems that support IEEE 802.3ad
634
        Dynamic Link Aggregation:
635
 
636
        * Extreme networks Summit 7i (look for link-aggregation).
637
        * Many Cisco switches and routers (look for LACP support; this may
638
          require an upgrade to your IOS software; LACP support was added
639
          by Cisco in late 2002).
640
        * Foundry Big Iron 4000
641
 
642
        In active-backup, balance-tlb and balance-alb modes, it should work
643
        with any Layer-II switch.
644
 
645
 
646
8.  Where does a bonding device get its MAC address from?
647
 
648
        If not explicitly configured with ifconfig, the MAC address of the
649
        bonding device is taken from its first slave device. This MAC address
650
        is then passed to all following slaves and remains persistent (even if
651
        the the first slave is removed) until the bonding device is brought
652
        down or reconfigured.
653
 
654
        If you wish to change the MAC address, you can set it with ifconfig:
655
 
656
          # ifconfig bond0 hw ether 00:11:22:33:44:55
657
 
658
        The MAC address can be also changed by bringing down/up the device
659
        and then changing its slaves (or their order):
660
 
661
          # ifconfig bond0 down ; modprobe -r bonding
662
          # ifconfig bond0 .... up
663
          # ifenslave bond0 eth...
664
 
665
        This method will automatically take the address from the next slave
666
        that will be added.
667
 
668
        To restore your slaves' MAC addresses, you need to detach them
669
        from the bond (`ifenslave -d bond0 eth0'). The bonding driver will then
670
        restore the MAC addresses that the slaves had before they were enslaved.
671
 
672
9.  Which transmit polices can be used?
673
 
674
        Round-robin, based on the order of enslaving, the output device
675
        is selected base on the next available slave. Regardless of
676
        the source and/or destination of the packet.
677
 
678
        Active-backup policy that ensures that one and only one device will
679
        transmit at any given moment. Active-backup policy is useful for
680
        implementing high availability solutions using two hubs (see
681
        section on High Availability).
682
 
683
        XOR, based on (src hw addr XOR dst hw addr) % slave count. This
684
        policy selects the same slave for each destination hw address.
685
 
686
        Broadcast policy transmits everything on all slave interfaces.
687
 
688
        802.3ad, based on XOR but distributes traffic among all interfaces
689
        in the active aggregator.
690
 
691
        Transmit load balancing (balance-tlb) balances the traffic
692
        according to the current load on each slave. The balancing is
693
        clients based and the least loaded slave is selected for each new
694
        client. The load of each slave is calculated relative to its speed
695
        and enables load balancing in mixed speed teams.
696
 
697
        Adaptive load balancing (balance-alb) uses the Transmit load
698
        balancing for the transmit load. The receive load is balanced only
699
        among the group of highest speed active slaves in the bond. The
700
        load is distributed with round-robin i.e. next available slave in
701
        the high speed group of active slaves.
702
 
703
High Availability
704
=================
705
 
706
To implement high availability using the bonding driver, the driver needs to be
707
compiled as a module, because currently it is the only way to pass parameters
708
to the driver. This may change in the future.
709
 
710
High availability is achieved by using MII or ETHTOOL status reporting. You
711
need to verify that all your interfaces support MII or ETHTOOL link status
712
reporting.  On Linux kernel 2.2.17, all the 100 Mbps capable drivers and
713
yellowfin gigabit driver support MII. To determine if ETHTOOL link reporting
714
is available for interface eth0, type "ethtool eth0" and the "Link detected:"
715
line should contain the correct link status. If your system has an interface
716
that does not support MII or ETHTOOL status reporting, a failure of its link
717
will not be detected! A message indicating MII and ETHTOOL is not supported by
718
a network driver is logged when the bonding driver is loaded with a non-zero
719
miimon value.
720
 
721
The bonding driver can regularly check all its slaves links using the ETHTOOL
722
IOCTL (ETHTOOL_GLINK command) or by checking the MII status registers. The
723
check interval is specified by the module argument "miimon" (MII monitoring).
724
It takes an integer that represents the checking time in milliseconds. It
725
should not come to close to (1000/HZ) (10 milli-seconds on i386) because it
726
may then reduce the system interactivity. A value of 100 seems to be a good
727
starting point. It means that a dead link will be detected at most 100
728
milli-seconds after it goes down.
729
 
730
Example:
731
 
732
   # modprobe bonding miimon=100
733
 
734
Or, put the following lines in /etc/modules.conf:
735
 
736
   alias bond0 bonding
737
   options bond0 miimon=100
738
 
739
There are currently two policies for high availability. They are dependent on
740
whether:
741
 
742
   a) hosts are connected to a single host or switch that support trunking
743
 
744
   b) hosts are connected to several different switches or a single switch that
745
      does not support trunking
746
 
747
 
748
1) High Availability on a single switch or host - load balancing
749
----------------------------------------------------------------
750
It is the easiest to set up and to understand. Simply configure the
751
remote equipment (host or switch) to aggregate traffic over several
752
ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces.
753
If the module has been loaded with the proper MII option, it will work
754
automatically. You can then try to remove and restore different links
755
and see in your logs what the driver detects. When testing, you may
756
encounter problems on some buggy switches that disable the trunk for a
757
long time if all ports in a trunk go down. This is not Linux, but really
758
the switch (reboot it to ensure).
759
 
760
Example 1 : host to host at twice the speed
761
 
762
          +----------+                          +----------+
763
          |          |eth0                  eth0|          |
764
          | Host A   +--------------------------+  Host B  |
765
          |          +--------------------------+          |
766
          |          |eth1                  eth1|          |
767
          +----------+                          +----------+
768
 
769
  On each host :
770
     # modprobe bonding miimon=100
771
     # ifconfig bond0 addr
772
     # ifenslave bond0 eth0 eth1
773
 
774
Example 2 : host to switch at twice the speed
775
 
776
          +----------+                          +----------+
777
          |          |eth0                 port1|          |
778
          | Host A   +--------------------------+  switch  |
779
          |          +--------------------------+          |
780
          |          |eth1                 port2|          |
781
          +----------+                          +----------+
782
 
783
  On host A :                             On the switch :
784
     # modprobe bonding miimon=100           # set up a trunk on port1
785
     # ifconfig bond0 addr                     and port2
786
     # ifenslave bond0 eth0 eth1
787
 
788
 
789
2) High Availability on two or more switches (or a single switch without
790
   trunking support)
791
---------------------------------------------------------------------------
792
This mode is more problematic because it relies on the fact that there
793
are multiple ports and the host's MAC address should be visible on one
794
port only to avoid confusing the switches.
795
 
796
If you need to know which interface is the active one, and which ones are
797
backup, use ifconfig. All backup interfaces have the NOARP flag set.
798
 
799
To use this mode, pass "mode=1" to the module at load time :
800
 
801
    # modprobe bonding miimon=100 mode=active-backup
802
 
803
        or:
804
 
805
    # modprobe bonding miimon=100 mode=1
806
 
807
Or, put in your /etc/modules.conf :
808
 
809
    alias bond0 bonding
810
    options bond0 miimon=100 mode=active-backup
811
 
812
Example 1: Using multiple host and multiple switches to build a "no single
813
point of failure" solution.
814
 
815
 
816
                |                                     |
817
                |port3                           port3|
818
          +-----+----+                          +-----+----+
819
          |          |port7       ISL      port7|          |
820
          | switch A +--------------------------+ switch B |
821
          |          +--------------------------+          |
822
          |          |port8                port8|          |
823
          +----++----+                          +-----++---+
824
          port2||port1                           port1||port2
825
               ||             +-------+               ||
826
               |+-------------+ host1 +---------------+|
827
               |         eth0 +-------+ eth1           |
828
               |                                       |
829
               |              +-------+                |
830
               +--------------+ host2 +----------------+
831
                         eth0 +-------+ eth1
832
 
833
In this configuration, there is an ISL - Inter Switch Link (could be a trunk),
834
several servers (host1, host2 ...) attached to both switches each, and one or
835
more ports to the outside world (port3...). One and only one slave on each host
836
is active at a time, while all links are still monitored (the system can
837
detect a failure of active and backup links).
838
 
839
Each time a host changes its active interface, it sticks to the new one until
840
it goes down. In this example, the hosts are negligibly affected by the
841
expiration time of the switches' forwarding tables.
842
 
843
If host1 and host2 have the same functionality and are used in load balancing
844
by another external mechanism, it is good to have host1's active interface
845
connected to one switch and host2's to the other. Such system will survive
846
a failure of a single host, cable, or switch. The worst thing that may happen
847
in the case of a switch failure is that half of the hosts will be temporarily
848
unreachable until the other switch expires its tables.
849
 
850
Example 2: Using multiple ethernet cards connected to a switch to configure
851
           NIC failover (switch is not required to support trunking).
852
 
853
 
854
          +----------+                          +----------+
855
          |          |eth0                 port1|          |
856
          | Host A   +--------------------------+  switch  |
857
          |          +--------------------------+          |
858
          |          |eth1                 port2|          |
859
          +----------+                          +----------+
860
 
861
  On host A :                                 On the switch :
862
     # modprobe bonding miimon=100 mode=1     # (optional) minimize the time
863
     # ifconfig bond0 addr                    # for table expiration
864
     # ifenslave bond0 eth0 eth1
865
 
866
Each time the host changes its active interface, it sticks to the new one until
867
it goes down. In this example, the host is strongly affected by the expiration
868
time of the switch forwarding table.
869
 
870
 
871
3) Adapting to your switches' timing
872
------------------------------------
873
If your switches take a long time to go into backup mode, it may be
874
desirable not to activate a backup interface immediately after a link goes
875
down. It is possible to delay the moment at which a link will be
876
completely disabled by passing the module parameter "downdelay" (in
877
milliseconds, must be a multiple of miimon).
878
 
879
When a switch reboots, it is possible that its ports report "link up" status
880
before they become usable. This could fool a bond device by causing it to
881
use some ports that are not ready yet. It is possible to delay the moment at
882
which an active link will be reused by passing the module parameter "updelay"
883
(in milliseconds, must be a multiple of miimon).
884
 
885
A similar situation can occur when a host re-negotiates a lost link with the
886
switch (a case of cable replacement).
887
 
888
A special case is when a bonding interface has lost all slave links. Then the
889
driver will immediately reuse the first link that goes up, even if updelay
890
parameter was specified. (If there are slave interfaces in the "updelay" state,
891
the interface that first went into that state will be immediately reused.) This
892
allows to reduce down-time if the value of updelay has been overestimated.
893
 
894
Examples :
895
 
896
    # modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000
897
    # modprobe bonding miimon=100 mode=balance-rr downdelay=0 updelay=5000
898
 
899
 
900
Promiscuous Sniffing notes
901
==========================
902
 
903
If you wish to bond channels together for a network sniffing
904
application --- you wish to run tcpdump, or ethereal, or an IDS like
905
snort, with its input aggregated from multiple interfaces using the
906
bonding driver --- then you need to handle the Promiscuous interface
907
setting by hand. Specifically, when you "ifconfing bond0 up" you
908
must add the promisc flag there; it will be propagated down to the
909
slave interfaces at ifenslave time; a full example might look like:
910
 
911
   grep bond0 /etc/modules.conf || echo alias bond0 bonding >/etc/modules.conf
912
   ifconfig bond0 promisc up
913
   for if in eth1 eth2 ...;do
914
       ifconfig $if up
915
       ifenslave bond0 $if
916
   done
917
   snort ... -i bond0 ...
918
 
919
Ifenslave also wants to propagate addresses from interface to
920
interface, appropriately for its design functions in HA and channel
921
capacity aggregating; but it works fine for unnumbered interfaces;
922
just ignore all the warnings it emits.
923
 
924
 
925
8021q VLAN support
926
==================
927
 
928
It is possible to configure VLAN devices over a bond interface using the 8021q
929
driver. However, only packets coming from the 8021q driver and passing through
930
bonding will be tagged by default. Self generated packets, like bonding's
931
learning packets or ARP packets generated by either ALB mode or the ARP
932
monitor mechanism, are tagged internally by bonding itself. As a result,
933
bonding has to "learn" what VLAN IDs are configured on top of it, and it uses
934
those IDs to tag self generated packets.
935
 
936
For simplicity reasons, and to support the use of adapters that can do VLAN
937
hardware acceleration offloding, the bonding interface declares itself as
938
fully hardware offloaing capable, it gets the add_vid/kill_vid notifications
939
to gather the necessary information, and it propagates those actions to the
940
slaves.
941
In case of mixed adapter types, hardware accelerated tagged packets that should
942
go through an adapter that is not offloading capable are "un-accelerated" by the
943
bonding driver so the VLAN tag sits in the regular location.
944
 
945
VLAN interfaces *must* be added on top of a bonding interface only after
946
enslaving at least one slave. This is because until the first slave is added the
947
bonding interface has a HW address of 00:00:00:00:00:00, which will be copied by
948
the VLAN interface when it is created.
949
 
950
Notice that a problem would occur if all slaves are released from a bond that
951
still has VLAN interfaces on top of it. When later coming to add new slaves, the
952
bonding interface would get a HW address from the first slave, which might not
953
match that of the VLAN interfaces. It is recommended that either all VLANs are
954
removed and then re-added, or to manually set the bonding interface's HW
955
address so it matches the VLAN's. (Note: changing a VLAN interface's HW address
956
would set the underlying device -- i.e. the bonding interface -- to promiscouos
957
mode, which might not be what you want).
958
 
959
 
960
Limitations
961
===========
962
The main limitations are :
963
  - only the link status is monitored. If the switch on the other side is
964
    partially down (e.g. doesn't forward anymore, but the link is OK), the link
965
    won't be disabled. Another way to check for a dead link could be to count
966
    incoming frames on a heavily loaded host. This is not applicable to small
967
    servers, but may be useful when the front switches send multicast
968
    information on their links (e.g. VRRP), or even health-check the servers.
969
    Use the arp_interval/arp_ip_target parameters to count incoming/outgoing
970
    frames.
971
 
972
 
973
 
974
Resources and Links
975
===================
976
 
977
Current development on this driver is posted to:
978
 - http://www.sourceforge.net/projects/bonding/
979
 
980
Donald Becker's Ethernet Drivers and diag programs may be found at :
981
 - http://www.scyld.com/network/
982
 
983
You will also find a lot of information regarding Ethernet, NWay, MII, etc. at
984
www.scyld.com.
985
 
986
Patches for 2.2 kernels are at Willy Tarreau's site :
987
 - http://wtarreau.free.fr/pub/bonding/
988
 - http://www-miaif.lip6.fr/~tarreau/pub/bonding/
989
 
990
To get latest informations about Linux Kernel development, please consult
991
the Linux Kernel Mailing List Archives at :
992
   http://www.ussg.iu.edu/hypermail/linux/kernel/
993
 
994
-- END --

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.