OpenCores
URL https://opencores.org/ocsvn/or1k/or1k/trunk

Subversion Repositories or1k

[/] [or1k/] [trunk/] [linux/] [linux-2.4/] [Documentation/] [networking/] [bonding.txt] - Rev 1765

Compare with Previous | Blame | View Log


                   Linux Ethernet Bonding Driver mini-howto

Initial release : Thomas Davis <tadavis at lbl.gov>
Corrections, HA extensions : 2000/10/03-15 :
  - Willy Tarreau <willy at meta-x.org>
  - Constantine Gavrilov <const-g at xpert.com>
  - Chad N. Tindel <ctindel at ieee dot org>
  - Janice Girouard <girouard at us dot ibm dot com>
  - Jay Vosburgh <fubar at us dot ibm dot com>

Note :
------
The bonding driver originally came from Donald Becker's beowulf patches for
kernel 2.0. It has changed quite a bit since, and the original tools from
extreme-linux and beowulf sites will not work with this version of the driver.

For new versions of the driver, patches for older kernels and the updated
userspace tools, please follow the links at the end of this file.


Table of Contents
=================

Installation
Bond Configuration
Module Parameters
Configuring Multiple Bonds
Switch Configuration
Verifying Bond Configuration
Frequently Asked Questions
High Availability
Promiscuous Sniffing notes
8021q VLAN support
Limitations
Resources and Links


Installation
============

1) Build kernel with the bonding driver
---------------------------------------
For the latest version of the bonding driver, use kernel 2.4.12 or above
(otherwise you will need to apply a patch).

Configure kernel with `make menuconfig/xconfig/config', and select "Bonding
driver support" in the "Network device support" section. It is recommended
to configure the driver as module since it is currently the only way to
pass parameters to the driver and configure more than one bonding device.

Build and install the new kernel and modules.

2) Get and install the userspace tools
--------------------------------------
This version of the bonding driver requires updated ifenslave program. The
original one from extreme-linux and beowulf will not work. Kernels 2.4.12
and above include the updated version of ifenslave.c in Documentation/network
directory. For older kernels, please follow the links at the end of this file.

IMPORTANT!!!  If you are running on Redhat 7.1 or greater, you need
to be careful because /usr/include/linux is no longer a symbolic link
to /usr/src/linux/include/linux.  If you build ifenslave while this is
true, ifenslave will appear to succeed but your bond won't work.  The purpose
of the -I option on the ifenslave compile line is to make sure it uses
/usr/src/linux/include/linux/if_bonding.h instead of the version from
/usr/include/linux.

To install ifenslave.c, do:
    # gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave
    # cp ifenslave /sbin/ifenslave


Bond Configuration
==================

You will need to add at least the following line to /etc/modules.conf
so the bonding driver will automatically load when the bond0 interface is
configured. Refer to the modules.conf manual page for specific modules.conf
syntax details. The Module Parameters section of this document describes each
bonding driver parameter.

        alias bond0 bonding

Use standard distribution techniques to define the bond0 network interface. For
example, on modern Red Hat distributions, create an ifcfg-bond0 file in
the /etc/sysconfig/network-scripts directory that resembles the following:

DEVICE=bond0
IPADDR=192.168.1.1
NETMASK=255.255.255.0
NETWORK=192.168.1.0
BROADCAST=192.168.1.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no

(use appropriate values for your network above)

All interfaces that are part of a bond should have SLAVE and MASTER
definitions. For example, in the case of Red Hat, if you wish to make eth0 and
eth1 a part of the bonding interface bond0, their config files (ifcfg-eth0 and
ifcfg-eth1) should resemble the following:

DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

Use DEVICE=eth1 in the ifcfg-eth1 config file. If you configure a second
bonding interface (bond1), use MASTER=bond1 in the config file to make the
network interface be a slave of bond1.

Restart the networking subsystem or just bring up the bonding device if your
administration tools allow it. Otherwise, reboot. On Red Hat distros you can
issue `ifup bond0' or `/etc/rc.d/init.d/network restart'.

If the administration tools of your distribution do not support
master/slave notation in configuring network interfaces, you will need to
manually configure the bonding device with the following commands:

    # /sbin/ifconfig bond0 192.168.1.1 netmask 255.255.255.0 \
      broadcast 192.168.1.255 up

    # /sbin/ifenslave bond0 eth0
    # /sbin/ifenslave bond0 eth1

(use appropriate values for your network above)

You can then create a script containing these commands and place it in the
appropriate rc directory.

If you specifically need all network drivers loaded before the bonding driver,
adding the following line to modules.conf will cause the network driver for
eth0 and eth1 to be loaded before the bonding driver.

probeall bond0 eth0 eth1 bonding

Be careful not to reference bond0 itself at the end of the line, or modprobe
will die in an endless recursive loop.

If running SNMP agents, the bonding driver should be loaded before any network
drivers participating in a bond. This requirement is due to the the interface
index (ipAdEntIfIndex) being associated to the first interface found with a
given IP address. That is, there is only one ipAdEntIfIndex for each IP
address. For example, if eth0 and eth1 are slaves of bond0 and the driver for
eth0 is loaded before the bonding driver, the interface for the IP address
will be associated with the eth0 interface. This configuration is shown below,
the IP address 192.168.1.1 has an interface index of 2 which indexes to eth0
in the ifDescr table (ifDescr.2).

     interfaces.ifTable.ifEntry.ifDescr.1 = lo
     interfaces.ifTable.ifEntry.ifDescr.2 = eth0
     interfaces.ifTable.ifEntry.ifDescr.3 = eth1
     interfaces.ifTable.ifEntry.ifDescr.4 = eth2
     interfaces.ifTable.ifEntry.ifDescr.5 = eth3
     interfaces.ifTable.ifEntry.ifDescr.6 = bond0
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1

This problem is avoided by loading the bonding driver before any network
drivers participating in a bond. Below is an example of loading the bonding
driver first, the IP address 192.168.1.1 is correctly associated with
ifDescr.2.

     interfaces.ifTable.ifEntry.ifDescr.1 = lo
     interfaces.ifTable.ifEntry.ifDescr.2 = bond0
     interfaces.ifTable.ifEntry.ifDescr.3 = eth0
     interfaces.ifTable.ifEntry.ifDescr.4 = eth1
     interfaces.ifTable.ifEntry.ifDescr.5 = eth2
     interfaces.ifTable.ifEntry.ifDescr.6 = eth3
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1

While some distributions may not report the interface name in ifDescr,
the association between the IP address and IfIndex remains and SNMP
functions such as Interface_Scan_Next will report that association.


Module Parameters
=================

Optional parameters for the bonding driver can be supplied as command line
arguments to the insmod command. Typically, these parameters are specified in
the file /etc/modules.conf (see the manual page for modules.conf). The
available bonding driver parameters are listed below. If a parameter is not
specified the default value is used. When initially configuring a bond, it
is recommended "tail -f /var/log/messages" be run in a separate window to
watch for bonding driver error messages.

It is critical that either the miimon or arp_interval and arp_ip_target
parameters be specified, otherwise serious network degradation will occur
during link failures.

arp_interval

        Specifies the ARP monitoring frequency in milli-seconds.
        If ARP monitoring is used in a load-balancing mode (mode 0 or 2), the
        switch should be configured in a mode that evenly distributes packets
        across all links - such as round-robin. If the switch is configured to
        distribute the packets in an XOR fashion, all replies from the ARP
        targets will be received on the same link which could cause the other
        team members to fail. ARP monitoring should not be used in conjunction
        with miimon. A value of 0 disables ARP monitoring. The default value
        is 0.

arp_ip_target

        Specifies the ip addresses to use when arp_interval is > 0. These
        are the targets of the ARP request sent to determine the health of
        the link to the targets. Specify these values in ddd.ddd.ddd.ddd
        format. Multiple ip adresses must be seperated by a comma. At least
        one ip address needs to be given for ARP monitoring to work. The
        maximum number of targets that can be specified is set at 16.

downdelay

        Specifies the delay time in milli-seconds to disable a link after a
        link failure has been detected. This should be a multiple of miimon
        value, otherwise the value will be rounded. The default value is 0.

lacp_rate

        Option specifying the rate in which we'll ask our link partner to
        transmit LACPDU packets in 802.3ad mode.  Possible values are:

        slow or 0
                Request partner to transmit LACPDUs every 30 seconds (default)

        fast or 1
                Request partner to transmit LACPDUs every 1 second

max_bonds

        Specifies the number of bonding devices to create for this
        instance of the bonding driver.  E.g., if max_bonds is 3, and
        the bonding driver is not already loaded, then bond0, bond1
        and bond2 will be created.  The default value is 1.

miimon

        Specifies the frequency in milli-seconds that MII link monitoring
        will occur. A value of zero disables MII link monitoring. A value
        of 100 is a good starting point. See High Availability section for
        additional information. The default value is 0.

mode

        Specifies one of the bonding policies. The default is
        round-robin (balance-rr).  Possible values are (you can use
        either the text or numeric option):

        balance-rr or 0

                Round-robin policy: Transmit in a sequential order
                from the first available slave through the last. This
                mode provides load balancing and fault tolerance.

        active-backup or 1

                Active-backup policy: Only one slave in the bond is
                active. A different slave becomes active if, and only
                if, the active slave fails. The bond's MAC address is
                externally visible on only one port (network adapter)
                to avoid confusing the switch.  This mode provides
                fault tolerance.

        balance-xor or 2

                XOR policy: Transmit based on [(source MAC address
                XOR'd with destination MAC address) modula slave
                count]. This selects the same slave for each
                destination MAC address. This mode provides load
                balancing and fault tolerance.

        broadcast or 3

                Broadcast policy: transmits everything on all slave
                interfaces. This mode provides fault tolerance.

        802.3ad or 4

                IEEE 802.3ad Dynamic link aggregation. Creates aggregation
                groups that share the same speed and duplex settings.
                Transmits and receives on all slaves in the active
                aggregator.

                Pre-requisites:

                1. Ethtool support in the base drivers for retrieving the
                speed and duplex of each slave.

                2. A switch that supports IEEE 802.3ad Dynamic link
                aggregation.

        balance-tlb or 5

                Adaptive transmit load balancing: channel bonding that does
                not require any special switch support. The outgoing
                traffic is distributed according to the current load
                (computed relative to the speed) on each slave. Incoming
                traffic is received by the current slave. If the receiving
                slave fails, another slave takes over the MAC address of
                the failed receiving slave.

                Prerequisite:

                Ethtool support in the base drivers for retrieving the
                speed of each slave.

        balance-alb or 6

                Adaptive load balancing: includes balance-tlb + receive
                load balancing (rlb) for IPV4 traffic and does not require
                any special switch support. The receive load balancing is
                achieved by ARP negotiation. The bonding driver intercepts
                the ARP Replies sent by the server on their way out and
                overwrites the src hw address with the unique hw address of
                one of the slaves in the bond such that different clients
                use different hw addresses for the server.

                Receive traffic from connections created by the server is
                also balanced. When the server sends an ARP Request the
                bonding driver copies and saves the client's IP information
                from the ARP. When the ARP Reply arrives from the client,
                its hw address is retrieved and the bonding driver
                initiates an ARP reply to this client assigning it to one
                of the slaves in the bond. A problematic outcome of using
                ARP negotiation for balancing is that each time that an ARP
                request is broadcasted it uses the hw address of the
                bond. Hence, clients learn the hw address of the bond and
                the balancing of receive traffic collapses to the current
                salve. This is handled by sending updates (ARP Replies) to
                all the clients with their assigned hw address such that
                the traffic is redistributed. Receive traffic is also
                redistributed when a new slave is added to the bond and
                when an inactive slave is re-activated. The receive load is
                distributed sequentially (round robin) among the group of
                highest speed slaves in the bond.

                When a link is reconnected or a new slave joins the bond
                the receive traffic is redistributed among all active
                slaves in the bond by intiating ARP Replies with the
                selected mac address to each of the clients. The updelay
                modeprobe parameter must be set to a value equal or greater
                than the switch's forwarding delay so that the ARP Replies
                sent to the clients will not be blocked by the switch.

                Prerequisites:

                1. Ethtool support in the base drivers for retrieving the
                speed of each slave.

                2. Base driver support for setting the hw address of a
                device also when it is open. This is required so that there
                will always be one slave in the team using the bond hw
                address (the curr_active_slave) while having a unique hw
                address for each slave in the bond. If the curr_active_slave
                fails it's hw address is swapped with the new curr_active_slave
                that was chosen.

primary

        A string (eth0, eth2, etc) to equate to a primary device. If this
        value is entered, and the device is on-line, it will be used first
        as the output media. Only when this device is off-line, will
        alternate devices be used. Otherwise, once a failover is detected
        and a new default output is chosen, it will remain the output media
        until it too fails. This is useful when one slave was preferred
        over another, i.e. when one slave is 1000Mbps and another is
        100Mbps. If the 1000Mbps slave fails and is later restored, it may
        be preferred the faster slave gracefully become the active slave -
        without deliberately failing the 100Mbps slave. Specifying a
        primary is only valid in active-backup mode.

updelay

        Specifies the delay time in milli-seconds to enable a link after a
        link up status has been detected. This should be a multiple of miimon
        value, otherwise the value will be rounded. The default value is 0.

use_carrier

        Specifies whether or not miimon should use MII or ETHTOOL
        ioctls vs. netif_carrier_ok() to determine the link status.
        The MII or ETHTOOL ioctls are less efficient and utilize a
        deprecated calling sequence within the kernel.  The
        netif_carrier_ok() relies on the device driver to maintain its
        state with netif_carrier_on/off; at this writing, most, but
        not all, device drivers support this facility.

        If bonding insists that the link is up when it should not be,
        it may be that your network device driver does not support
        netif_carrier_on/off.  This is because the default state for
        netif_carrier is "carrier on." In this case, disabling
        use_carrier will cause bonding to revert to the MII / ETHTOOL
        ioctl method to determine the link state.

        A value of 1 enables the use of netif_carrier_ok(), a value of
        0 will use the deprecated MII / ETHTOOL ioctls.  The default
        value is 1.


Configuring Multiple Bonds
==========================

If several bonding interfaces are required, either specify the max_bonds
parameter (described above), or load the driver multiple times.  Using
the max_bonds parameter is less complicated, but has the limitation that
all bonding instances created will have the same options.  Loading the
driver multiple times allows each instance of the driver to have differing
options.

For example, to configure two bonding interfaces, one with mii link
monitoring performed every 100 milliseconds, and one with ARP link
monitoring performed every 200 milliseconds, the /etc/conf.modules should
resemble the following:

alias bond0 bonding
alias bond1 bonding

options bond0 miimon=100
options bond1 -o bonding1 arp_interval=200 arp_ip_target=10.0.0.1

Configuring Multiple ARP Targets
================================

While ARP monitoring can be done with just one target, it can be useful
in a High Availability setup to have several targets to monitor. In the
case of just one target,  the target itself may go down or have a problem
making it unresponsive to ARP requests. Having an additional target (or
several) increases the reliability of the ARP monitoring.

Multiple ARP targets must be seperated by commas as follows:

# example options for ARP monitoring with three targets
alias bond0 bonding
options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9

For just a single target the options would resemble:

# example options for ARP monitoring with one target
alias bond0 bonding
options bond0 arp_interval=60 arp_ip_target=192.168.0.100

Potential Problems When Using ARP Monitor
=========================================

1. Driver support

The ARP monitor relies on the network device driver to maintain two
statistics: the last receive time (dev->last_rx), and the last
transmit time (dev->trans_start).  If the network device driver does
not update one or both of these, then the typical result will be that,
upon startup, all links in the bond will immediately be declared down,
and remain that way.  A network monitoring tool (tcpdump, e.g.) will
show ARP requests and replies being sent and received on the bonding
device.

The possible resolutions for this are to (a) fix the device driver, or
(b) discontinue the ARP monitor (using miimon as an alternative, for
example).

2. Adventures in Routing

When bonding is set up with the ARP monitor, it is important that the
slave devices not have routes that supercede routes of the master (or,
generally, not have routes at all).  For example, suppose the bonding
device bond0 has two slaves, eth0 and eth1, and the routing table is
as follows:

Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth0
10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth1
10.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 bond0
127.0.0.0       0.0.0.0         255.0.0.0       U        40 0          0 lo

In this case, the ARP monitor (and ARP itself) may become confused,
because ARP requests will be sent on one interface (bond0), but the
corresponding reply will arrive on a different interface (eth0).  This
reply looks to ARP as an unsolicited ARP reply (because ARP matches
replies on an interface basis), and is discarded.  This will likely
still update the receive/transmit times in the driver, but will lose
packets.

The resolution here is simply to insure that slaves do not have routes
of their own, and if for some reason they must, those routes do not
supercede routes of their master.  This should generally be the case,
but unusual configurations or errant manual or automatic static route
additions may cause trouble.

Switch Configuration
====================

While the switch does not need to be configured when the active-backup,
balance-tlb or balance-alb policies (mode=1,5,6) are used, it does need to
be configured for the round-robin, XOR, broadcast, or 802.3ad policies
(mode=0,2,3,4).


Verifying Bond Configuration
============================

1) Bonding information files
----------------------------
The bonding driver information files reside in the /proc/net/bonding directory.

Sample contents of /proc/net/bonding/bond0 after the driver is loaded with
parameters of mode=0 and miimon=1000 is shown below.

        Bonding Mode: load balancing (round-robin)
        Currently Active Slave: eth0
        MII Status: up
        MII Polling Interval (ms): 1000
        Up Delay (ms): 0
        Down Delay (ms): 0

        Slave Interface: eth1
        MII Status: up
        Link Failure Count: 1

        Slave Interface: eth0
        MII Status: up
        Link Failure Count: 1

2) Network verification
-----------------------
The network configuration can be verified using the ifconfig command. In
the example below, the bond0 interface is the master (MASTER) while eth0 and
eth1 are slaves (SLAVE). Notice all slaves of bond0 have the same MAC address
(HWaddr) as bond0 for all modes except TLB and ALB that require a unique MAC
address for each slave.

[root]# /sbin/ifconfig
bond0     Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:0

eth0      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:10 Base address:0x1080

eth1      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:9 Base address:0x1400


Frequently Asked Questions
==========================

1.  Is it SMP safe?

        Yes. The old 2.0.xx channel bonding patch was not SMP safe.
        The new driver was designed to be SMP safe from the start.

2.  What type of cards will work with it?

        Any Ethernet type cards (you can even mix cards - a Intel
        EtherExpress PRO/100 and a 3com 3c905b, for example).
        You can even bond together Gigabit Ethernet cards!

3.  How many bonding devices can I have?

        There is no limit.

4.  How many slaves can a bonding device have?

        Limited by the number of network interfaces Linux supports and/or the
        number of network cards you can place in your system.

5.  What happens when a slave link dies?

        If your ethernet cards support MII or ETHTOOL link status monitoring
        and the MII monitoring has been enabled in the driver (see description
        of module parameters), there will be no adverse consequences. This
        release of the bonding driver knows how to get the MII information and
        enables or disables its slaves according to their link status.
        See section on High Availability for additional information.

        For ethernet cards not supporting MII status, the arp_interval and
        arp_ip_target parameters must be specified for bonding to work
        correctly. If packets have not been sent or received during the
        specified arp_interval duration, an ARP request is sent to the
        targets to generate send and receive traffic. If after this
        interval, either the successful send and/or receive count has not
        incremented, the next slave in the sequence will become the active
        slave.

        If neither mii_monitor and arp_interval is configured, the bonding
        driver will not handle this situation very well. The driver will
        continue to send packets but some packets will be lost. Retransmits
        will cause serious degradation of performance (in the case when one
        of two slave links fails, 50% packets will be lost, which is a serious
        problem for both TCP and UDP).

6.  Can bonding be used for High Availability?

        Yes, if you use MII monitoring and ALL your cards support MII link
        status reporting. See section on High Availability for more
        information.

7.  Which switches/systems does it work with?

        In round-robin and XOR mode, it works with systems that support
        trunking:

        * Many Cisco switches and routers (look for EtherChannel support).
        * SunTrunking software.
        * Alteon AceDirector switches / WebOS (use Trunks).
        * BayStack Switches (trunks must be explicitly configured). Stackable
          models (450) can define trunks between ports on different physical
          units.
        * Linux bonding, of course !

        In 802.3ad mode, it works with with systems that support IEEE 802.3ad
        Dynamic Link Aggregation:

        * Extreme networks Summit 7i (look for link-aggregation).
        * Many Cisco switches and routers (look for LACP support; this may
          require an upgrade to your IOS software; LACP support was added
          by Cisco in late 2002).
        * Foundry Big Iron 4000

        In active-backup, balance-tlb and balance-alb modes, it should work
        with any Layer-II switch.


8.  Where does a bonding device get its MAC address from?

        If not explicitly configured with ifconfig, the MAC address of the
        bonding device is taken from its first slave device. This MAC address
        is then passed to all following slaves and remains persistent (even if
        the the first slave is removed) until the bonding device is brought
        down or reconfigured.

        If you wish to change the MAC address, you can set it with ifconfig:

          # ifconfig bond0 hw ether 00:11:22:33:44:55

        The MAC address can be also changed by bringing down/up the device
        and then changing its slaves (or their order):

          # ifconfig bond0 down ; modprobe -r bonding
          # ifconfig bond0 .... up
          # ifenslave bond0 eth...

        This method will automatically take the address from the next slave
        that will be added.

        To restore your slaves' MAC addresses, you need to detach them
        from the bond (`ifenslave -d bond0 eth0'). The bonding driver will then
        restore the MAC addresses that the slaves had before they were enslaved.

9.  Which transmit polices can be used?

        Round-robin, based on the order of enslaving, the output device
        is selected base on the next available slave. Regardless of
        the source and/or destination of the packet.

        Active-backup policy that ensures that one and only one device will
        transmit at any given moment. Active-backup policy is useful for
        implementing high availability solutions using two hubs (see
        section on High Availability).

        XOR, based on (src hw addr XOR dst hw addr) % slave count. This
        policy selects the same slave for each destination hw address.

        Broadcast policy transmits everything on all slave interfaces.

        802.3ad, based on XOR but distributes traffic among all interfaces
        in the active aggregator.

        Transmit load balancing (balance-tlb) balances the traffic
        according to the current load on each slave. The balancing is
        clients based and the least loaded slave is selected for each new
        client. The load of each slave is calculated relative to its speed
        and enables load balancing in mixed speed teams.

        Adaptive load balancing (balance-alb) uses the Transmit load
        balancing for the transmit load. The receive load is balanced only
        among the group of highest speed active slaves in the bond. The
        load is distributed with round-robin i.e. next available slave in
        the high speed group of active slaves.

High Availability
=================

To implement high availability using the bonding driver, the driver needs to be
compiled as a module, because currently it is the only way to pass parameters
to the driver. This may change in the future.

High availability is achieved by using MII or ETHTOOL status reporting. You
need to verify that all your interfaces support MII or ETHTOOL link status
reporting.  On Linux kernel 2.2.17, all the 100 Mbps capable drivers and
yellowfin gigabit driver support MII. To determine if ETHTOOL link reporting
is available for interface eth0, type "ethtool eth0" and the "Link detected:"
line should contain the correct link status. If your system has an interface
that does not support MII or ETHTOOL status reporting, a failure of its link
will not be detected! A message indicating MII and ETHTOOL is not supported by
a network driver is logged when the bonding driver is loaded with a non-zero
miimon value.

The bonding driver can regularly check all its slaves links using the ETHTOOL
IOCTL (ETHTOOL_GLINK command) or by checking the MII status registers. The
check interval is specified by the module argument "miimon" (MII monitoring).
It takes an integer that represents the checking time in milliseconds. It
should not come to close to (1000/HZ) (10 milli-seconds on i386) because it
may then reduce the system interactivity. A value of 100 seems to be a good
starting point. It means that a dead link will be detected at most 100
milli-seconds after it goes down.

Example:

   # modprobe bonding miimon=100

Or, put the following lines in /etc/modules.conf:

   alias bond0 bonding
   options bond0 miimon=100

There are currently two policies for high availability. They are dependent on
whether:

   a) hosts are connected to a single host or switch that support trunking

   b) hosts are connected to several different switches or a single switch that
      does not support trunking


1) High Availability on a single switch or host - load balancing
----------------------------------------------------------------
It is the easiest to set up and to understand. Simply configure the
remote equipment (host or switch) to aggregate traffic over several
ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces.
If the module has been loaded with the proper MII option, it will work
automatically. You can then try to remove and restore different links
and see in your logs what the driver detects. When testing, you may
encounter problems on some buggy switches that disable the trunk for a
long time if all ports in a trunk go down. This is not Linux, but really
the switch (reboot it to ensure).

Example 1 : host to host at twice the speed

          +----------+                          +----------+
          |          |eth0                  eth0|          |
          | Host A   +--------------------------+  Host B  |
          |          +--------------------------+          |
          |          |eth1                  eth1|          |
          +----------+                          +----------+

  On each host :
     # modprobe bonding miimon=100
     # ifconfig bond0 addr
     # ifenslave bond0 eth0 eth1

Example 2 : host to switch at twice the speed

          +----------+                          +----------+
          |          |eth0                 port1|          |
          | Host A   +--------------------------+  switch  |
          |          +--------------------------+          |
          |          |eth1                 port2|          |
          +----------+                          +----------+

  On host A :                             On the switch :
     # modprobe bonding miimon=100           # set up a trunk on port1
     # ifconfig bond0 addr                     and port2
     # ifenslave bond0 eth0 eth1


2) High Availability on two or more switches (or a single switch without
   trunking support)
---------------------------------------------------------------------------
This mode is more problematic because it relies on the fact that there
are multiple ports and the host's MAC address should be visible on one
port only to avoid confusing the switches.

If you need to know which interface is the active one, and which ones are
backup, use ifconfig. All backup interfaces have the NOARP flag set.

To use this mode, pass "mode=1" to the module at load time :

    # modprobe bonding miimon=100 mode=active-backup

        or:

    # modprobe bonding miimon=100 mode=1

Or, put in your /etc/modules.conf :

    alias bond0 bonding
    options bond0 miimon=100 mode=active-backup

Example 1: Using multiple host and multiple switches to build a "no single
point of failure" solution.


                |                                     |
                |port3                           port3|
          +-----+----+                          +-----+----+
          |          |port7       ISL      port7|          |
          | switch A +--------------------------+ switch B |
          |          +--------------------------+          |
          |          |port8                port8|          |
          +----++----+                          +-----++---+
          port2||port1                           port1||port2
               ||             +-------+               ||
               |+-------------+ host1 +---------------+|
               |         eth0 +-------+ eth1           |
               |                                       |
               |              +-------+                |
               +--------------+ host2 +----------------+
                         eth0 +-------+ eth1

In this configuration, there is an ISL - Inter Switch Link (could be a trunk),
several servers (host1, host2 ...) attached to both switches each, and one or
more ports to the outside world (port3...). One and only one slave on each host
is active at a time, while all links are still monitored (the system can
detect a failure of active and backup links).

Each time a host changes its active interface, it sticks to the new one until
it goes down. In this example, the hosts are negligibly affected by the
expiration time of the switches' forwarding tables.

If host1 and host2 have the same functionality and are used in load balancing
by another external mechanism, it is good to have host1's active interface
connected to one switch and host2's to the other. Such system will survive
a failure of a single host, cable, or switch. The worst thing that may happen
in the case of a switch failure is that half of the hosts will be temporarily
unreachable until the other switch expires its tables.

Example 2: Using multiple ethernet cards connected to a switch to configure
           NIC failover (switch is not required to support trunking).


          +----------+                          +----------+
          |          |eth0                 port1|          |
          | Host A   +--------------------------+  switch  |
          |          +--------------------------+          |
          |          |eth1                 port2|          |
          +----------+                          +----------+

  On host A :                                 On the switch :
     # modprobe bonding miimon=100 mode=1     # (optional) minimize the time
     # ifconfig bond0 addr                    # for table expiration
     # ifenslave bond0 eth0 eth1

Each time the host changes its active interface, it sticks to the new one until
it goes down. In this example, the host is strongly affected by the expiration
time of the switch forwarding table.


3) Adapting to your switches' timing
------------------------------------
If your switches take a long time to go into backup mode, it may be
desirable not to activate a backup interface immediately after a link goes
down. It is possible to delay the moment at which a link will be
completely disabled by passing the module parameter "downdelay" (in
milliseconds, must be a multiple of miimon).

When a switch reboots, it is possible that its ports report "link up" status
before they become usable. This could fool a bond device by causing it to
use some ports that are not ready yet. It is possible to delay the moment at
which an active link will be reused by passing the module parameter "updelay"
(in milliseconds, must be a multiple of miimon).

A similar situation can occur when a host re-negotiates a lost link with the
switch (a case of cable replacement).

A special case is when a bonding interface has lost all slave links. Then the
driver will immediately reuse the first link that goes up, even if updelay
parameter was specified. (If there are slave interfaces in the "updelay" state,
the interface that first went into that state will be immediately reused.) This
allows to reduce down-time if the value of updelay has been overestimated.

Examples :

    # modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000
    # modprobe bonding miimon=100 mode=balance-rr downdelay=0 updelay=5000


Promiscuous Sniffing notes
==========================

If you wish to bond channels together for a network sniffing
application --- you wish to run tcpdump, or ethereal, or an IDS like
snort, with its input aggregated from multiple interfaces using the
bonding driver --- then you need to handle the Promiscuous interface
setting by hand. Specifically, when you "ifconfing bond0 up" you
must add the promisc flag there; it will be propagated down to the
slave interfaces at ifenslave time; a full example might look like:

   grep bond0 /etc/modules.conf || echo alias bond0 bonding >/etc/modules.conf
   ifconfig bond0 promisc up
   for if in eth1 eth2 ...;do
       ifconfig $if up
       ifenslave bond0 $if
   done
   snort ... -i bond0 ...

Ifenslave also wants to propagate addresses from interface to
interface, appropriately for its design functions in HA and channel
capacity aggregating; but it works fine for unnumbered interfaces;
just ignore all the warnings it emits.


8021q VLAN support
==================

It is possible to configure VLAN devices over a bond interface using the 8021q
driver. However, only packets coming from the 8021q driver and passing through
bonding will be tagged by default. Self generated packets, like bonding's
learning packets or ARP packets generated by either ALB mode or the ARP
monitor mechanism, are tagged internally by bonding itself. As a result,
bonding has to "learn" what VLAN IDs are configured on top of it, and it uses
those IDs to tag self generated packets.

For simplicity reasons, and to support the use of adapters that can do VLAN
hardware acceleration offloding, the bonding interface declares itself as
fully hardware offloaing capable, it gets the add_vid/kill_vid notifications
to gather the necessary information, and it propagates those actions to the
slaves.
In case of mixed adapter types, hardware accelerated tagged packets that should
go through an adapter that is not offloading capable are "un-accelerated" by the
bonding driver so the VLAN tag sits in the regular location.

VLAN interfaces *must* be added on top of a bonding interface only after
enslaving at least one slave. This is because until the first slave is added the
bonding interface has a HW address of 00:00:00:00:00:00, which will be copied by
the VLAN interface when it is created.

Notice that a problem would occur if all slaves are released from a bond that
still has VLAN interfaces on top of it. When later coming to add new slaves, the
bonding interface would get a HW address from the first slave, which might not
match that of the VLAN interfaces. It is recommended that either all VLANs are
removed and then re-added, or to manually set the bonding interface's HW
address so it matches the VLAN's. (Note: changing a VLAN interface's HW address
would set the underlying device -- i.e. the bonding interface -- to promiscouos
mode, which might not be what you want).


Limitations
===========
The main limitations are :
  - only the link status is monitored. If the switch on the other side is
    partially down (e.g. doesn't forward anymore, but the link is OK), the link
    won't be disabled. Another way to check for a dead link could be to count
    incoming frames on a heavily loaded host. This is not applicable to small
    servers, but may be useful when the front switches send multicast
    information on their links (e.g. VRRP), or even health-check the servers.
    Use the arp_interval/arp_ip_target parameters to count incoming/outgoing
    frames.



Resources and Links
===================

Current development on this driver is posted to:
 - http://www.sourceforge.net/projects/bonding/

Donald Becker's Ethernet Drivers and diag programs may be found at :
 - http://www.scyld.com/network/

You will also find a lot of information regarding Ethernet, NWay, MII, etc. at
www.scyld.com.

Patches for 2.2 kernels are at Willy Tarreau's site :
 - http://wtarreau.free.fr/pub/bonding/
 - http://www-miaif.lip6.fr/~tarreau/pub/bonding/

To get latest informations about Linux Kernel development, please consult
the Linux Kernel Mailing List Archives at :
   http://www.ussg.iu.edu/hypermail/linux/kernel/

-- END --

Compare with Previous | Blame | View Log

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.