URL
https://opencores.org/ocsvn/test_project/test_project/trunk
Subversion Repositories test_project
[/] [test_project/] [trunk/] [linux_sd_driver/] [Documentation/] [MSI-HOWTO.txt] - Rev 62
Compare with Previous | Blame | View Log
The MSI Driver Guide HOWTOTom L Nguyen tom.l.nguyen@intel.com10/03/2003Revised Feb 12, 2004 by Martine Silbermannemail: Martine.Silbermann@hp.comRevised Jun 25, 2004 by Tom L Nguyen1. About this guideThis guide describes the basics of Message Signaled Interrupts (MSI),the advantages of using MSI over traditional interrupt mechanisms,and how to enable your driver to use MSI or MSI-X. Also included isa Frequently Asked Questions (FAQ) section.1.1 TerminologyPCI devices can be single-function or multi-function. In either case,when this text talks about enabling or disabling MSI on a "devicefunction," it is referring to one specific PCI device and function andnot to all functions on a PCI device (unless the PCI device has onlyone function).2. Copyright 2003 Intel Corporation3. What is MSI/MSI-X?Message Signaled Interrupt (MSI), as described in the PCI Local BusSpecification Revision 2.3 or later, is an optional feature, and arequired feature for PCI Express devices. MSI enables a device functionto request service by sending an Inbound Memory Write on its PCI bus tothe FSB as a Message Signal Interrupt transaction. Because MSI isgenerated in the form of a Memory Write, all transaction conditions,such as a Retry, Master-Abort, Target-Abort or normal completion, aresupported.A PCI device that supports MSI must also support pin IRQ assertioninterrupt mechanism to provide backward compatibility for systems thatdo not support MSI. In systems which support MSI, the bus driver isresponsible for initializing the message address and message data ofthe device function's MSI/MSI-X capability structure during deviceinitial configuration.An MSI capable device function indicates MSI support by implementingthe MSI/MSI-X capability structure in its PCI capability list. Thedevice function may implement both the MSI capability structure andthe MSI-X capability structure; however, the bus driver should notenable both.The MSI capability structure contains Message Control register,Message Address register and Message Data register. These registersprovide the bus driver control over MSI. The Message Control registerindicates the MSI capability supported by the device. The MessageAddress register specifies the target address and the Message Dataregister specifies the characteristics of the message. To requestservice, the device function writes the content of the Message Dataregister to the target address. The device and its software driverare prohibited from writing to these registers.The MSI-X capability structure is an optional extension to MSI. Ituses an independent and separate capability structure. There aresome key advantages to implementing the MSI-X capability structureover the MSI capability structure as described below.- Support a larger maximum number of vectors per function.- Provide the ability for system software to configureeach vector with an independent message address and messagedata, specified by a table that resides in Memory Space.- MSI and MSI-X both support per-vector masking. Per-vectormasking is an optional extension of MSI but a requiredfeature for MSI-X. Per-vector masking provides the kernel theability to mask/unmask a single MSI while running itsinterrupt service routine. If per-vector masking isnot supported, then the device driver should provide thehardware/software synchronization to ensure that the devicegenerates MSI when the driver wants it to do so.4. Why use MSI?As a benefit to the simplification of board design, MSI allows boarddesigners to remove out-of-band interrupt routing. MSI is anotherstep towards a legacy-free environment.Due to increasing pressure on chipset and processor packages toreduce pin count, the need for interrupt pins is expected todiminish over time. Devices, due to pin constraints, may implementmessages to increase performance.PCI Express endpoints uses INTx emulation (in-band messages) insteadof IRQ pin assertion. Using INTx emulation requires interruptsharing among devices connected to the same node (PCI bridge) whileMSI is unique (non-shared) and does not require BIOS configurationsupport. As a result, the PCI Express technology requires MSIsupport for better interrupt performance.Using MSI enables the device functions to support two or morevectors, which can be configured to target different CPUs toincrease scalability.5. Configuring a driver to use MSI/MSI-XBy default, the kernel will not enable MSI/MSI-X on all devices thatsupport this capability. The CONFIG_PCI_MSI kernel optionmust be selected to enable MSI/MSI-X support.5.1 Including MSI/MSI-X support into the kernelTo allow MSI/MSI-X capable device drivers to selectively enableMSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as describedbelow), the VECTOR based scheme needs to be enabled by settingCONFIG_PCI_MSI during kernel config.Since the target of the inbound message is the local APIC, providingCONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.5.2 Configuring for MSI supportDue to the non-contiguous fashion in vector assignment of theexisting Linux kernel, this version does not support multiplemessages regardless of a device function is capable of supportingmore than one vector. To enable MSI on a device function's MSIcapability structure requires a device driver to call the functionpci_enable_msi() explicitly.5.2.1 API pci_enable_msiint pci_enable_msi(struct pci_dev *dev)With this new API, a device driver that wants to have MSIenabled on its device function must call this API to enable MSI.A successful call will initialize the MSI capability structurewith ONE vector, regardless of whether a device function iscapable of supporting multiple messages. This vector replaces thepre-assigned dev->irq with a new MSI vector. To avoid a conflictof the new assigned vector with existing pre-assigned vector requiresa device driver to call this API before calling request_irq().5.2.2 API pci_disable_msivoid pci_disable_msi(struct pci_dev *dev)This API should always be used to undo the effect of pci_enable_msi()when a device driver is unloading. This API restores dev->irq withthe pre-assigned IOAPIC vector and switches a device's interruptmode to PCI pin-irq assertion/INTx emulation mode.Note that a device driver should always call free_irq() on the MSI vectorthat it has done request_irq() on before calling this API. Failure to doso results in a BUG_ON() and a device will be left with MSI enabled andleaks its vector.5.2.3 MSI mode vs. legacy mode diagramThe below diagram shows the events which switch the interruptmode on the MSI-capable device function between MSI mode andPIN-IRQ assertion mode.------------ pci_enable_msi ------------------------| | <=============== | || MSI MODE | | PIN-IRQ ASSERTION MODE || | ===============> | |------------ pci_disable_msi ------------------------Figure 1. MSI Mode vs. Legacy ModeIn Figure 1, a device operates by default in legacy mode. Legacyin this context means PCI pin-irq assertion or PCI-Express INTxemulation. A successful MSI request (using pci_enable_msi()) switchesa device's interrupt mode to MSI mode. A pre-assigned IOAPIC vectorstored in dev->irq will be saved by the PCI subsystem and a newassigned MSI vector will replace dev->irq.To return back to its default mode, a device driver should always callpci_disable_msi() to undo the effect of pci_enable_msi(). Note that adevice driver should always call free_irq() on the MSI vector it hasdone request_irq() on before calling pci_disable_msi(). Failure to doso results in a BUG_ON() and a device will be left with MSI enabled andleaks its vector. Otherwise, the PCI subsystem restores a device'sdev->irq with a pre-assigned IOAPIC vector and marks the releasedMSI vector as unused.Once being marked as unused, there is no guarantee that the PCIsubsystem will reserve this MSI vector for a device. Depending onthe availability of current PCI vector resources and the number ofMSI/MSI-X requests from other drivers, this MSI may be re-assigned.For the case where the PCI subsystem re-assigns this MSI vector toanother driver, a request to switch back to MSI mode may resultin being assigned a different MSI vector or a failure if no morevectors are available.5.3 Configuring for MSI-X supportDue to the ability of the system software to configure each vector ofthe MSI-X capability structure with an independent message addressand message data, the non-contiguous fashion in vector assignment ofthe existing Linux kernel has no impact on supporting multiplemessages on an MSI-X capable device functions. To enable MSI-X ona device function's MSI-X capability structure requires its devicedriver to call the function pci_enable_msix() explicitly.The function pci_enable_msix(), once invoked, enables eitherall or nothing, depending on the current availability of PCI vectorresources. If the PCI vector resources are available for the numberof vectors requested by a device driver, this function will configurethe MSI-X table of the MSI-X capability structure of a device withrequested messages. To emphasize this reason, for example, a devicemay be capable for supporting the maximum of 32 vectors while itssoftware driver usually may request 4 vectors. It is recommendedthat the device driver should call this function once during theinitialization phase of the device driver.Unlike the function pci_enable_msi(), the function pci_enable_msix()does not replace the pre-assigned IOAPIC dev->irq with a new MSIvector because the PCI subsystem writes the 1:1 vector-to-entry mappinginto the field vector of each element contained in a second argument.Note that the pre-assigned IOAPIC dev->irq is valid only if the deviceoperates in PIN-IRQ assertion mode. In MSI-X mode, any attempt atusing dev->irq by the device driver to request for interrupt servicemay result in unpredictable behavior.For each MSI-X vector granted, a device driver is responsible for callingother functions like request_irq(), enable_irq(), etc. to enablethis vector with its corresponding interrupt service handler. It isa device driver's choice to assign all vectors with the sameinterrupt service handler or each vector with a unique interruptservice handler.5.3.1 Handling MMIO address space of MSI-X TableThe PCI 3.0 specification has implementation notes that MMIO addressspace for a device's MSI-X structure should be isolated so that thesoftware system can set different pages for controlling accesses to theMSI-X structure. The implementation of MSI support requires the PCIsubsystem, not a device driver, to maintain full control of the MSI-Xtable/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-Xtable/MSI-X PBA. A device driver is prohibited from requesting the MMIOaddress space of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystemwill fail enabling MSI-X on its hardware device when it calls the functionpci_enable_msix().5.3.2 API pci_enable_msixint pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)This API enables a device driver to request the PCI subsystemto enable MSI-X messages on its hardware device. Depending onthe availability of PCI vectors resources, the PCI subsystem enableseither all or none of the requested vectors.Argument 'dev' points to the device (pci_dev) structure.Argument 'entries' is a pointer to an array of msix_entry structs.The number of entries is indicated in argument 'nvec'.struct msix_entry is defined in /driver/pci/msi.h:struct msix_entry {u16 vector; /* kernel uses to write alloc vector */u16 entry; /* driver uses to specify entry */};A device driver is responsible for initializing the field 'entry' ofeach element with a unique entry supported by MSI-X table. Otherwise,-EINVAL will be returned as a result. A successful return of zeroindicates the PCI subsystem completed initializing each of the requestedentries of the MSI-X table with message address and message data.Last but not least, the PCI subsystem will write the 1:1vector-to-entry mapping into the field 'vector' of each element. Adevice driver is responsible for keeping track of allocated MSI-Xvectors in its internal data structure.A return of zero indicates that the number of MSI-X vectors wassuccessfully allocated. A return of greater than zero indicatesMSI-X vector shortage. Or a return of less than zero indicatesa failure. This failure may be a result of duplicate entriesspecified in second argument, or a result of no available vector,or a result of failing to initialize MSI-X table entries.5.3.3 API pci_disable_msixvoid pci_disable_msix(struct pci_dev *dev)This API should always be used to undo the effect of pci_enable_msix()when a device driver is unloading. Note that a device driver shouldalways call free_irq() on all MSI-X vectors it has done request_irq()on before calling this API. Failure to do so results in a BUG_ON() anda device will be left with MSI-X enabled and leaks its vectors.5.3.4 MSI-X mode vs. legacy mode diagramThe below diagram shows the events which switch the interruptmode on the MSI-X capable device function between MSI-X mode andPIN-IRQ assertion mode (legacy).------------ pci_enable_msix(,,n) ------------------------| | <=============== | || MSI-X MODE | | PIN-IRQ ASSERTION MODE || | ===============> | |------------ pci_disable_msix ------------------------Figure 2. MSI-X Mode vs. Legacy ModeIn Figure 2, a device operates by default in legacy mode. Asuccessful MSI-X request (using pci_enable_msix()) switches adevice's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vectorstored in dev->irq will be saved by the PCI subsystem; however,unlike MSI mode, the PCI subsystem will not replace dev->irq withassigned MSI-X vector because the PCI subsystem already writes the 1:1vector-to-entry mapping into the field 'vector' of each elementspecified in second argument.To return back to its default mode, a device driver should always callpci_disable_msix() to undo the effect of pci_enable_msix(). Note thata device driver should always call free_irq() on all MSI-X vectors ithas done request_irq() on before calling pci_disable_msix(). Failureto do so results in a BUG_ON() and a device will be left with MSI-Xenabled and leaks its vectors. Otherwise, the PCI subsystem switches adevice function's interrupt mode from MSI-X mode to legacy mode andmarks all allocated MSI-X vectors as unused.Once being marked as unused, there is no guarantee that the PCIsubsystem will reserve these MSI-X vectors for a device. Depending onthe availability of current PCI vector resources and the number ofMSI/MSI-X requests from other drivers, these MSI-X vectors may bere-assigned.For the case where the PCI subsystem re-assigned these MSI-X vectorsto other drivers, a request to switch back to MSI-X mode may resultbeing assigned with another set of MSI-X vectors or a failure if nomore vectors are available.5.4 Handling function implementing both MSI and MSI-X capabilitiesFor the case where a function implements both MSI and MSI-Xcapabilities, the PCI subsystem enables a device to run either in MSImode or MSI-X mode but not both. A device driver determines whether itwants MSI or MSI-X enabled on its hardware device. Once a devicedriver requests for MSI, for example, it is prohibited from requestingMSI-X; in other words, a device driver is not permitted to ping-pongbetween MSI mod MSI-X mode during a run-time.5.5 Hardware requirements for MSI/MSI-X supportMSI/MSI-X support requires support from both system hardware andindividual hardware device functions.5.5.1 Required x86 hardware supportSince the target of MSI address is the local APIC CPU, enablingMSI/MSI-X support in the Linux kernel is dependent on whether existingsystem hardware supports local APIC. Users should verify that theirsystem supports local APIC operation by testing that it runs whenCONFIG_X86_LOCAL_APIC=y.In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;however, in UP environment, users must manually setCONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, settingCONFIG_PCI_MSI enables the VECTOR based scheme and the option forMSI-capable device drivers to selectively enable MSI/MSI-X.Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-Xvector is allocated new during runtime and MSI/MSI-X support does notdepend on BIOS support. This key independency enables MSI/MSI-Xsupport on future IOxAPIC free platforms.5.5.2 Device hardware supportThe hardware device function supports MSI by indicating theMSI/MSI-X capability structure on its PCI capability list. Bydefault, this capability structure will not be initialized bythe kernel to enable MSI during the system boot. In other words,the device function is running on its default pin assertion mode.Note that in many cases the hardware supporting MSI have bugs,which may result in system hangs. The software driver of specificMSI-capable hardware is responsible for deciding whether to callpci_enable_msi or not. A return of zero indicates the kernelsuccessfully initialized the MSI/MSI-X capability structure of thedevice function. The device function is now running on MSI/MSI-X mode.5.6 How to tell whether MSI/MSI-X is enabled on device functionAt the driver level, a return of zero from the function call ofpci_enable_msi()/pci_enable_msix() indicates to a device driver thatits device function is initialized successfully and ready to run inMSI/MSI-X mode.At the user level, users can use the command 'cat /proc/interrupts'to display the vectors allocated for devices and their interruptMSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode isenabled on a SCSI Adaptec 39320D Ultra320 controller.CPU0 CPU10: 324639 0 IO-APIC-edge timer1: 1186 0 IO-APIC-edge i80422: 0 0 XT-PIC cascade12: 2797 0 IO-APIC-edge i804214: 6543 0 IO-APIC-edge ide015: 1 0 IO-APIC-edge ide1169: 0 0 IO-APIC-level uhci-hcd185: 0 0 IO-APIC-level uhci-hcd193: 138 10 PCI-MSI aic79xx201: 30 0 PCI-MSI aic79xx225: 30 0 IO-APIC-level aic7xxx233: 30 0 IO-APIC-level aic7xxxNMI: 0 0LOC: 324553 325068ERR: 0MIS: 06. MSI quirksSeveral PCI chipsets or devices are known to not support MSI.The PCI stack provides 3 possible levels of MSI disabling:* on a single device* on all devices behind a specific bridge* globally6.1. Disabling MSI on a single deviceUnder some circumstances it might be required to disable MSI on asingle device. This may be achieved by either not calling pci_enable_msi()or all, or setting the pci_dev->no_msi flag before (most of the timein a quirk).6.2. Disabling MSI below a bridgeThe vast majority of MSI quirks are required by PCI bridges notbeing able to route MSI between busses. In this case, MSI have to bedisabled on all devices behind this bridge. It is achieves by settingthe PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridgesubordinate bus. There is no need to set the same flag on bridges thatare below the broken bridge. When pci_enable_msi() is called to enableMSI on a device, pci_msi_supported() takes care of checking the NO_MSIflag in all parent busses of the device.Some bridges actually support dynamic MSI support enabling/disablingby changing some bits in their PCI configuration space (especiallythe Hypertransport chipsets such as the nVidia nForce and ServerworksHT2000). It may then be required to update the NO_MSI flag on thecorresponding devices in the sysfs hierarchy. To enable MSI supporton device "0000:00:0e", do:echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_busTo disable MSI support, echo 0 instead of 1. Note that it should beused with caution since changing this value might break interrupts.6.3. Disabling MSI globallySome extreme cases may require to disable MSI globally on the system.For now, the only known case is a Serverworks PCI-X chipsets (MSI arenot supported on several busses that are not all connected to thechipset in the Linux PCI hierarchy). In the vast majority of othercases, disabling only behind a specific bridge is enough.For debugging purpose, the user may also pass pci=nomsi on the kernelcommand-line to explicitly disable MSI globally. But, once the appro-priate quirks are added to the kernel, this option should not berequired anymore.6.4. Finding why MSI cannot be enabled on a deviceAssuming that MSI are not enabled on a device, you should look atdmesg to find messages that quirks may output when disabling MSIon some devices, some bridges or even globally.Then, lspci -t gives the list of bridges above a device. Reading/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSIare enabled (1) or disabled (0). In 0 is found in a single bridgemsi_bus file above the device, MSI cannot be enabled.7. FAQQ1. Are there any limitations on using the MSI?A1. If the PCI device supports MSI and conforms to thespecification and the platform supports the APIC local bus,then using MSI should work.Q2. Will it work on all the Pentium processors (P3, P4, Xeon,AMD processors)? In P3 IPI's are transmitted on the APIC localbus and in P4 and Xeon they are transmitted on the systembus. Are there any implications with this?A2. MSI support enables a PCI device sending an inboundmemory write (0xfeexxxxx as target address) on its PCI busdirectly to the FSB. Since the message address has aredirection hint bit cleared, it should work.Q3. The target address 0xfeexxxxx will be translated by theHost Bridge into an interrupt message. Are there anylimitations on the chipsets such as Intel 8xx, Intel e7xxx,or VIA?A3. If these chipsets support an inbound memory write withtarget address set as 0xfeexxxxx, as conformed to PCIspecification 2.3 or latest, then it should work.Q4. From the driver point of view, if the MSI is lost becauseof errors occurring during inbound memory write, then it maywait forever. Is there a mechanism for it to recover?A4. Since the target of the transaction is an inbound memorywrite, all transaction termination conditions (Retry,Master-Abort, Target-Abort, or normal completion) aresupported. A device sending an MSI must abide by all the PCIrules and conditions regarding that inbound memory write. So,if a retry is signaled it must retry, etc... We believe thatthe recommendation for Abort is also a retry (refer to PCIspecification 2.3 or latest).
