OpenCores
URL https://opencores.org/ocsvn/or1k/or1k/trunk

Subversion Repositories or1k

[/] [or1k/] [trunk/] [linux/] [linux-2.4/] [Documentation/] [DMA-mapping.txt] - Blame information for rev 1774

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 1275 phoenix
                        Dynamic DMA mapping
2
                        ===================
3
 
4
                 David S. Miller 
5
                 Richard Henderson 
6
                  Jakub Jelinek 
7
 
8
Most of the 64bit platforms have special hardware that translates bus
9
addresses (DMA addresses) into physical addresses.  This is similar to
10
how page tables and/or a TLB translates virtual addresses to physical
11
addresses on a CPU.  This is needed so that e.g. PCI devices can
12
access with a Single Address Cycle (32bit DMA address) any page in the
13
64bit physical address space.  Previously in Linux those 64bit
14
platforms had to set artificial limits on the maximum RAM size in the
15
system, so that the virt_to_bus() static scheme works (the DMA address
16
translation tables were simply filled on bootup to map each bus
17
address to the physical page __pa(bus_to_virt())).
18
 
19
So that Linux can use the dynamic DMA mapping, it needs some help from the
20
drivers, namely it has to take into account that DMA addresses should be
21
mapped only for the time they are actually used and unmapped after the DMA
22
transfer.
23
 
24
The following API will work of course even on platforms where no such
25
hardware exists, see e.g. include/asm-i386/pci.h for how it is implemented on
26
top of the virt_to_bus interface.
27
 
28
First of all, you should make sure
29
 
30
#include 
31
 
32
is in your driver. This file will obtain for you the definition of the
33
dma_addr_t (which can hold any valid DMA address for the platform)
34
type which should be used everywhere you hold a DMA (bus) address
35
returned from the DMA mapping functions.
36
 
37
                         What memory is DMA'able?
38
 
39
The first piece of information you must know is what kernel memory can
40
be used with the DMA mapping facilities.  There has been an unwritten
41
set of rules regarding this, and this text is an attempt to finally
42
write them down.
43
 
44
If you acquired your memory via the page allocator
45
(i.e. __get_free_page*()) or the generic memory allocators
46
(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from
47
that memory using the addresses returned from those routines.
48
 
49
This means specifically that you may _not_ use the memory/addresses
50
returned from vmalloc() for DMA.  It is possible to DMA to the
51
_underlying_ memory mapped into a vmalloc() area, but this requires
52
walking page tables to get the physical addresses, and then
53
translating each of those pages back to a kernel address using
54
something like __va().  [ EDIT: Update this when we integrate
55
Gerd Knorr's generic code which does this. ]
56
 
57
This rule also means that you may not use kernel image addresses
58
(ie. items in the kernel's data/text/bss segment, or your driver's)
59
nor may you use kernel stack addresses for DMA.  Both of these items
60
might be mapped somewhere entirely different than the rest of physical
61
memory.
62
 
63
Also, this means that you cannot take the return of a kmap()
64
call and DMA to/from that.  This is similar to vmalloc().
65
 
66
What about block I/O and networking buffers?  The block I/O and
67
networking subsystems make sure that the buffers they use are valid
68
for you to DMA from/to.
69
 
70
                        DMA addressing limitations
71
 
72
Does your device have any DMA addressing limitations?  For example, is
73
your device only capable of driving the low order 24-bits of address
74
on the PCI bus for SAC DMA transfers?  If so, you need to inform the
75
PCI layer of this fact.
76
 
77
By default, the kernel assumes that your device can address the full
78
32-bits in a SAC cycle.  For a 64-bit DAC capable device, this needs
79
to be increased.  And for a device with limitations, as discussed in
80
the previous paragraph, it needs to be decreased.
81
 
82
For correct operation, you must interrogate the PCI layer in your
83
device probe routine to see if the PCI controller on the machine can
84
properly support the DMA addressing limitation your device has.  It is
85
good style to do this even if your device holds the default setting,
86
because this shows that you did think about these issues wrt. your
87
device.
88
 
89
The query is performed via a call to pci_set_dma_mask():
90
 
91
        int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask);
92
 
93
Here, pdev is a pointer to the PCI device struct of your device, and
94
device_mask is a bit mask describing which bits of a PCI address your
95
device supports.  It returns zero if your card can perform DMA
96
properly on the machine given the address mask you provided.
97
 
98
If it returns non-zero, your device can not perform DMA properly on
99
this platform, and attempting to do so will result in undefined
100
behavior.  You must either use a different mask, or not use DMA.
101
 
102
This means that in the failure case, you have three options:
103
 
104
1) Use another DMA mask, if possible (see below).
105
2) Use some non-DMA mode for data transfer, if possible.
106
3) Ignore this device and do not initialize it.
107
 
108
It is recommended that your driver print a kernel KERN_WARNING message
109
when you end up performing either #2 or #3.  In this manner, if a user
110
of your driver reports that performance is bad or that the device is not
111
even detected, you can ask them for the kernel messages to find out
112
exactly why.
113
 
114
The standard 32-bit addressing PCI device would do something like
115
this:
116
 
117
        if (pci_set_dma_mask(pdev, 0xffffffff)) {
118
                printk(KERN_WARNING
119
                       "mydev: No suitable DMA available.\n");
120
                goto ignore_this_device;
121
        }
122
 
123
Another common scenario is a 64-bit capable device.  The approach
124
here is to try for 64-bit DAC addressing, but back down to a
125
32-bit mask should that fail.  The PCI platform code may fail the
126
64-bit mask not because the platform is not capable of 64-bit
127
addressing.  Rather, it may fail in this case simply because
128
32-bit SAC addressing is done more efficiently than DAC addressing.
129
Sparc64 is one platform which behaves in this way.
130
 
131
Here is how you would handle a 64-bit capable device which can drive
132
all 64-bits during a DAC cycle:
133
 
134
        int using_dac;
135
 
136
        if (!pci_set_dma_mask(pdev, 0xffffffffffffffff)) {
137
                using_dac = 1;
138
        } else if (!pci_set_dma_mask(pdev, 0xffffffff)) {
139
                using_dac = 0;
140
        } else {
141
                printk(KERN_WARNING
142
                       "mydev: No suitable DMA available.\n");
143
                goto ignore_this_device;
144
        }
145
 
146
If your 64-bit device is going to be an enormous consumer of DMA
147
mappings, this can be problematic since the DMA mappings are a
148
finite resource on many platforms.  Please see the "DAC Addressing
149
for Address Space Hungry Devices" section near the end of this
150
document for how to handle this case.
151
 
152
Finally, if your device can only drive the low 24-bits of
153
address during PCI bus mastering you might do something like:
154
 
155
        if (pci_set_dma_mask(pdev, 0x00ffffff)) {
156
                printk(KERN_WARNING
157
                       "mydev: 24-bit DMA addressing not available.\n");
158
                goto ignore_this_device;
159
        }
160
 
161
When pci_set_dma_mask() is successful, and returns zero, the PCI layer
162
saves away this mask you have provided.  The PCI layer will use this
163
information later when you make DMA mappings.
164
 
165
There is a case which we are aware of at this time, which is worth
166
mentioning in this documentation.  If your device supports multiple
167
functions (for example a sound card provides playback and record
168
functions) and the various different functions have _different_
169
DMA addressing limitations, you may wish to probe each mask and
170
only provide the functionality which the machine can handle.  It
171
is important that the last call to pci_set_dma_mask() be for the
172
most specific mask.
173
 
174
Here is pseudo-code showing how this might be done:
175
 
176
        #define PLAYBACK_ADDRESS_BITS   0xffffffff
177
        #define RECORD_ADDRESS_BITS     0x00ffffff
178
 
179
        struct my_sound_card *card;
180
        struct pci_dev *pdev;
181
 
182
        ...
183
        if (pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) {
184
                card->playback_enabled = 1;
185
        } else {
186
                card->playback_enabled = 0;
187
                printk(KERN_WARN "%s: Playback disabled due to DMA limitations.\n",
188
                       card->name);
189
        }
190
        if (pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) {
191
                card->record_enabled = 1;
192
        } else {
193
                card->record_enabled = 0;
194
                printk(KERN_WARN "%s: Record disabled due to DMA limitations.\n",
195
                       card->name);
196
        }
197
 
198
A sound card was used as an example here because this genre of PCI
199
devices seems to be littered with ISA chips given a PCI front end,
200
and thus retaining the 16MB DMA addressing limitations of ISA.
201
 
202
                        Types of DMA mappings
203
 
204
There are two types of DMA mappings:
205
 
206
- Consistent DMA mappings which are usually mapped at driver
207
  initialization, unmapped at the end and for which the hardware should
208
  guarantee that the device and the CPU can access the data
209
  in parallel and will see updates made by each other without any
210
  explicit software flushing.
211
 
212
  Think of "consistent" as "synchronous" or "coherent".
213
 
214
  Consistent DMA mappings are always SAC addressable.  That is
215
  to say, consistent DMA addresses given to the driver will always
216
  be in the low 32-bits of the PCI bus space.
217
 
218
  Good examples of what to use consistent mappings for are:
219
 
220
        - Network card DMA ring descriptors.
221
        - SCSI adapter mailbox command data structures.
222
        - Device firmware microcode executed out of
223
          main memory.
224
 
225
  The invariant these examples all require is that any CPU store
226
  to memory is immediately visible to the device, and vice
227
  versa.  Consistent mappings guarantee this.
228
 
229
  IMPORTANT: Consistent DMA memory does not preclude the usage of
230
             proper memory barriers.  The CPU may reorder stores to
231
             consistent memory just as it may normal memory.  Example:
232
             if it is important for the device to see the first word
233
             of a descriptor updated before the second, you must do
234
             something like:
235
 
236
                desc->word0 = address;
237
                wmb();
238
                desc->word1 = DESC_VALID;
239
 
240
             in order to get correct behavior on all platforms.
241
 
242
- Streaming DMA mappings which are usually mapped for one DMA transfer,
243
  unmapped right after it (unless you use pci_dma_sync below) and for which
244
  hardware can optimize for sequential accesses.
245
 
246
  This of "streaming" as "asynchronous" or "outside the coherency
247
  domain".
248
 
249
  Good examples of what to use streaming mappings for are:
250
 
251
        - Networking buffers transmitted/received by a device.
252
        - Filesystem buffers written/read by a SCSI device.
253
 
254
  The interfaces for using this type of mapping were designed in
255
  such a way that an implementation can make whatever performance
256
  optimizations the hardware allows.  To this end, when using
257
  such mappings you must be explicit about what you want to happen.
258
 
259
Neither type of DMA mapping has alignment restrictions that come
260
from PCI, although some devices may have such restrictions.
261
 
262
                 Using Consistent DMA mappings.
263
 
264
To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
265
you should do:
266
 
267
        dma_addr_t dma_handle;
268
 
269
        cpu_addr = pci_alloc_consistent(dev, size, &dma_handle);
270
 
271
where dev is a struct pci_dev *. You should pass NULL for PCI like buses
272
where devices don't have struct pci_dev (like ISA, EISA).  This may be
273
called in interrupt context.
274
 
275
This argument is needed because the DMA translations may be bus
276
specific (and often is private to the bus which the device is attached
277
to).
278
 
279
Size is the length of the region you want to allocate, in bytes.
280
 
281
This routine will allocate RAM for that region, so it acts similarly to
282
__get_free_pages (but takes size instead of a page order).  If your
283
driver needs regions sized smaller than a page, you may prefer using
284
the pci_pool interface, described below.
285
 
286
The consistent DMA mapping interfaces, for non-NULL dev, will always
287
return a DMA address which is SAC (Single Address Cycle) addressable.
288
Even if the device indicates (via PCI dma mask) that it may address
289
the upper 32-bits and thus perform DAC cycles, consistent allocation
290
will still only return 32-bit PCI addresses for DMA.  This is true
291
of the pci_pool interface as well.
292
 
293
In fact, as mentioned above, all consistent memory provided by the
294
kernel DMA APIs are always SAC addressable.
295
 
296
pci_alloc_consistent returns two values: the virtual address which you
297
can use to access it from the CPU and dma_handle which you pass to the
298
card.
299
 
300
The cpu return address and the DMA bus master address are both
301
guaranteed to be aligned to the smallest PAGE_SIZE order which
302
is greater than or equal to the requested size.  This invariant
303
exists (for example) to guarantee that if you allocate a chunk
304
which is smaller than or equal to 64 kilobytes, the extent of the
305
buffer you receive will not cross a 64K boundary.
306
 
307
To unmap and free such a DMA region, you call:
308
 
309
        pci_free_consistent(dev, size, cpu_addr, dma_handle);
310
 
311
where dev, size are the same as in the above call and cpu_addr and
312
dma_handle are the values pci_alloc_consistent returned to you.
313
This function may not be called in interrupt context.
314
 
315
If your driver needs lots of smaller memory regions, you can write
316
custom code to subdivide pages returned by pci_alloc_consistent,
317
or you can use the pci_pool API to do that.  A pci_pool is like
318
a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages.
319
Also, it understands common hardware constraints for alignment,
320
like queue heads needing to be aligned on N byte boundaries.
321
 
322
Create a pci_pool like this:
323
 
324
        struct pci_pool *pool;
325
 
326
        pool = pci_pool_create(name, dev, size, align, alloc, flags);
327
 
328
The "name" is for diagnostics (like a kmem_cache name); dev and size
329
are as above.  The device's hardware alignment requirement for this
330
type of data is "align" (which is expressed in bytes, and must be a
331
power of two).  The flags are SLAB_ flags as you'd pass to
332
kmem_cache_create.  Not all flags are understood, but SLAB_POISON may
333
help you find driver bugs.  If you call this in a non- sleeping
334
context (f.e. in_interrupt is true or while holding SMP locks), pass
335
SLAB_ATOMIC.  If your device has no boundary crossing restrictions,
336
pass 0 for alloc; passing 4096 says memory allocated from this pool
337
must not cross 4KByte boundaries (but at that time it may be better to
338
go for pci_alloc_consistent directly instead).
339
 
340
Allocate memory from a pci pool like this:
341
 
342
        cpu_addr = pci_pool_alloc(pool, flags, &dma_handle);
343
 
344
flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor
345
holding SMP locks), SLAB_ATOMIC otherwise.  Like pci_alloc_consistent,
346
this returns two values, cpu_addr and dma_handle.
347
 
348
Free memory that was allocated from a pci_pool like this:
349
 
350
        pci_pool_free(pool, cpu_addr, dma_handle);
351
 
352
where pool is what you passed to pci_pool_alloc, and cpu_addr and
353
dma_handle are the values pci_pool_alloc returned. This function
354
may be called in interrupt context.
355
 
356
Destroy a pci_pool by calling:
357
 
358
        pci_pool_destroy(pool);
359
 
360
Make sure you've called pci_pool_free for all memory allocated
361
from a pool before you destroy the pool. This function may not
362
be called in interrupt context.
363
 
364
                        DMA Direction
365
 
366
The interfaces described in subsequent portions of this document
367
take a DMA direction argument, which is an integer and takes on
368
one of the following values:
369
 
370
 PCI_DMA_BIDIRECTIONAL
371
 PCI_DMA_TODEVICE
372
 PCI_DMA_FROMDEVICE
373
 PCI_DMA_NONE
374
 
375
One should provide the exact DMA direction if you know it.
376
 
377
PCI_DMA_TODEVICE means "from main memory to the PCI device"
378
PCI_DMA_FROMDEVICE means "from the PCI device to main memory"
379
It is the direction in which the data moves during the DMA
380
transfer.
381
 
382
You are _strongly_ encouraged to specify this as precisely
383
as you possibly can.
384
 
385
If you absolutely cannot know the direction of the DMA transfer,
386
specify PCI_DMA_BIDIRECTIONAL.  It means that the DMA can go in
387
either direction.  The platform guarantees that you may legally
388
specify this, and that it will work, but this may be at the
389
cost of performance for example.
390
 
391
The value PCI_DMA_NONE is to be used for debugging.  One can
392
hold this in a data structure before you come to know the
393
precise direction, and this will help catch cases where your
394
direction tracking logic has failed to set things up properly.
395
 
396
Another advantage of specifying this value precisely (outside of
397
potential platform-specific optimizations of such) is for debugging.
398
Some platforms actually have a write permission boolean which DMA
399
mappings can be marked with, much like page protections in the user
400
program address space.  Such platforms can and do report errors in the
401
kernel logs when the PCI controller hardware detects violation of the
402
permission setting.
403
 
404
Only streaming mappings specify a direction, consistent mappings
405
implicitly have a direction attribute setting of
406
PCI_DMA_BIDIRECTIONAL.
407
 
408
The SCSI subsystem provides mechanisms for you to easily obtain
409
the direction to use, in the SCSI command:
410
 
411
        scsi_to_pci_dma_dir(SCSI_DIRECTION)
412
 
413
Where SCSI_DIRECTION is obtained from the 'sc_data_direction'
414
member of the SCSI command your driver is working on.  The
415
mentioned interface above returns a value suitable for passing
416
into the streaming DMA mapping interfaces below.
417
 
418
For Networking drivers, it's a rather simple affair.  For transmit
419
packets, map/unmap them with the PCI_DMA_TODEVICE direction
420
specifier.  For receive packets, just the opposite, map/unmap them
421
with the PCI_DMA_FROMDEVICE direction specifier.
422
 
423
                  Using Streaming DMA mappings
424
 
425
The streaming DMA mapping routines can be called from interrupt
426
context.  There are two versions of each map/unmap, one which will
427
map/unmap a single memory region, and one which will map/unmap a
428
scatterlist.
429
 
430
To map a single region, you do:
431
 
432
        struct pci_dev *pdev = mydev->pdev;
433
        dma_addr_t dma_handle;
434
        void *addr = buffer->ptr;
435
        size_t size = buffer->len;
436
 
437
        dma_handle = pci_map_single(dev, addr, size, direction);
438
 
439
and to unmap it:
440
 
441
        pci_unmap_single(dev, dma_handle, size, direction);
442
 
443
You should call pci_unmap_single when the DMA activity is finished, e.g.
444
from the interrupt which told you that the DMA transfer is done.
445
 
446
Using cpu pointers like this for single mappings has a disadvantage,
447
you cannot reference HIGHMEM memory in this way.  Thus, there is a
448
map/unmap interface pair akin to pci_{map,unmap}_single.  These
449
interfaces deal with page/offset pairs instead of cpu pointers.
450
Specifically:
451
 
452
        struct pci_dev *pdev = mydev->pdev;
453
        dma_addr_t dma_handle;
454
        struct page *page = buffer->page;
455
        unsigned long offset = buffer->offset;
456
        size_t size = buffer->len;
457
 
458
        dma_handle = pci_map_page(dev, page, offset, size, direction);
459
 
460
        ...
461
 
462
        pci_unmap_page(dev, dma_handle, size, direction);
463
 
464
Here, "offset" means byte offset within the given page.
465
 
466
With scatterlists, you map a region gathered from several regions by:
467
 
468
        int i, count = pci_map_sg(dev, sglist, nents, direction);
469
        struct scatterlist *sg;
470
 
471
        for (i = 0, sg = sglist; i < count; i++, sg++) {
472
                hw_address[i] = sg_dma_address(sg);
473
                hw_len[i] = sg_dma_len(sg);
474
        }
475
 
476
where nents is the number of entries in the sglist.
477
 
478
The implementation is free to merge several consecutive sglist entries
479
into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
480
consecutive sglist entries can be merged into one provided the first one
481
ends and the second one starts on a page boundary - in fact this is a huge
482
advantage for cards which either cannot do scatter-gather or have very
483
limited number of scatter-gather entries) and returns the actual number
484
of sg entries it mapped them to.
485
 
486
Then you should loop count times (note: this can be less than nents times)
487
and use sg_dma_address() and sg_dma_len() macros where you previously
488
accessed sg->address and sg->length as shown above.
489
 
490
To unmap a scatterlist, just call:
491
 
492
        pci_unmap_sg(dev, sglist, nents, direction);
493
 
494
Again, make sure DMA activity has already finished.
495
 
496
PLEASE NOTE:  The 'nents' argument to the pci_unmap_sg call must be
497
              the _same_ one you passed into the pci_map_sg call,
498
              it should _NOT_ be the 'count' value _returned_ from the
499
              pci_map_sg call.
500
 
501
Every pci_map_{single,sg} call should have its pci_unmap_{single,sg}
502
counterpart, because the bus address space is a shared resource (although
503
in some ports the mapping is per each BUS so less devices contend for the
504
same bus address space) and you could render the machine unusable by eating
505
all bus addresses.
506
 
507
If you need to use the same streaming DMA region multiple times and touch
508
the data in between the DMA transfers, just map it with
509
pci_map_{single,sg}, and after each DMA transfer call either:
510
 
511
        pci_dma_sync_single(dev, dma_handle, size, direction);
512
 
513
or:
514
 
515
        pci_dma_sync_sg(dev, sglist, nents, direction);
516
 
517
as appropriate.
518
 
519
After the last DMA transfer call one of the DMA unmap routines
520
pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_*
521
call till pci_unmap_*, then you don't have to call the pci_dma_sync_*
522
routines at all.
523
 
524
Here is pseudo code which shows a situation in which you would need
525
to use the pci_dma_sync_*() interfaces.
526
 
527
        my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
528
        {
529
                dma_addr_t mapping;
530
 
531
                mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE);
532
 
533
                cp->rx_buf = buffer;
534
                cp->rx_len = len;
535
                cp->rx_dma = mapping;
536
 
537
                give_rx_buf_to_card(cp);
538
        }
539
 
540
        ...
541
 
542
        my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs)
543
        {
544
                struct my_card *cp = devid;
545
 
546
                ...
547
                if (read_card_status(cp) == RX_BUF_TRANSFERRED) {
548
                        struct my_card_header *hp;
549
 
550
                        /* Examine the header to see if we wish
551
                         * to accept the data.  But synchronize
552
                         * the DMA transfer with the CPU first
553
                         * so that we see updated contents.
554
                         */
555
                        pci_dma_sync_single(cp->pdev, cp->rx_dma, cp->rx_len,
556
                                            PCI_DMA_FROMDEVICE);
557
 
558
                        /* Now it is safe to examine the buffer. */
559
                        hp = (struct my_card_header *) cp->rx_buf;
560
                        if (header_is_ok(hp)) {
561
                                pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len,
562
                                                 PCI_DMA_FROMDEVICE);
563
                                pass_to_upper_layers(cp->rx_buf);
564
                                make_and_setup_new_rx_buf(cp);
565
                        } else {
566
                                /* Just give the buffer back to the card. */
567
                                give_rx_buf_to_card(cp);
568
                        }
569
                }
570
        }
571
 
572
Drivers converted fully to this interface should not use virt_to_bus any
573
longer, nor should they use bus_to_virt. Some drivers have to be changed a
574
little bit, because there is no longer an equivalent to bus_to_virt in the
575
dynamic DMA mapping scheme - you have to always store the DMA addresses
576
returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single
577
calls (pci_map_sg stores them in the scatterlist itself if the platform
578
supports dynamic DMA mapping in hardware) in your driver structures and/or
579
in the card registers.
580
 
581
All PCI drivers should be using these interfaces with no exceptions.
582
It is planned to completely remove virt_to_bus() and bus_to_virt() as
583
they are entirely deprecated.  Some ports already do not provide these
584
as it is impossible to correctly support them.
585
 
586
                64-bit DMA and DAC cycle support
587
 
588
Do you understand all of the text above?  Great, then you already
589
know how to use 64-bit DMA addressing under Linux.  Simply make
590
the appropriate pci_set_dma_mask() calls based upon your cards
591
capabilities, then use the mapping APIs above.
592
 
593
It is that simple.
594
 
595
Well, not for some odd devices.  See the next section for information
596
about that.
597
 
598
        DAC Addressing for Address Space Hungry Devices
599
 
600
There exists a class of devices which do not mesh well with the PCI
601
DMA mapping API.  By definition these "mappings" are a finite
602
resource.  The number of total available mappings per bus is platform
603
specific, but there will always be a reasonable amount.
604
 
605
What is "reasonable"?  Reasonable means that networking and block I/O
606
devices need not worry about using too many mappings.
607
 
608
As an example of a problematic device, consider compute cluster cards.
609
They can potentially need to access gigabytes of memory at once via
610
DMA.  Dynamic mappings are unsuitable for this kind of access pattern.
611
 
612
To this end we've provided a small API by which a device driver
613
may use DAC cycles to directly address all of physical memory.
614
Not all platforms support this, but most do.  It is easy to determine
615
whether the platform will work properly at probe time.
616
 
617
First, understand that there may be a SEVERE performance penalty for
618
using these interfaces on some platforms.  Therefore, you MUST only
619
use these interfaces if it is absolutely required.  %99 of devices can
620
use the normal APIs without any problems.
621
 
622
Note that for streaming type mappings you must either use these
623
interfaces, or the dynamic mapping interfaces above.  You may not mix
624
usage of both for the same device.  Such an act is illegal and is
625
guaranteed to put a banana in your tailpipe.
626
 
627
However, consistent mappings may in fact be used in conjunction with
628
these interfaces.  Remember that, as defined, consistent mappings are
629
always going to be SAC addressable.
630
 
631
The first thing your driver needs to do is query the PCI platform
632
layer with your devices DAC addressing capabilities:
633
 
634
        int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask);
635
 
636
This routine behaves identically to pci_set_dma_mask.  You may not
637
use the following interfaces if this routine fails.
638
 
639
Next, DMA addresses using this API are kept track of using the
640
dma64_addr_t type.  It is guaranteed to be big enough to hold any
641
DAC address the platform layer will give to you from the following
642
routines.  If you have consistent mappings as well, you still
643
use plain dma_addr_t to keep track of those.
644
 
645
All mappings obtained here will be direct.  The mappings are not
646
translated, and this is the purpose of this dialect of the DMA API.
647
 
648
All routines work with page/offset pairs.  This is the _ONLY_ way to
649
portably refer to any piece of memory.  If you have a cpu pointer
650
(which may be validly DMA'd too) you may easily obtain the page
651
and offset using something like this:
652
 
653
        struct page *page = virt_to_page(ptr);
654
        unsigned long offset = ((unsigned long)ptr & ~PAGE_MASK);
655
 
656
Here are the interfaces:
657
 
658
        dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev,
659
                                         struct page *page,
660
                                         unsigned long offset,
661
                                         int direction);
662
 
663
The DAC address for the tuple PAGE/OFFSET are returned.  The direction
664
argument is the same as for pci_{map,unmap}_single().  The same rules
665
for cpu/device access apply here as for the streaming mapping
666
interfaces.  To reiterate:
667
 
668
        The cpu may touch the buffer before pci_dac_page_to_dma.
669
        The device may touch the buffer after pci_dac_page_to_dma
670
        is made, but the cpu may NOT.
671
 
672
When the DMA transfer is complete, invoke:
673
 
674
        void pci_dac_dma_sync_single(struct pci_dev *pdev,
675
                                     dma64_addr_t dma_addr,
676
                                     size_t len, int direction);
677
 
678
This must be done before the CPU looks at the buffer again.
679
This interface behaves identically to pci_dma_sync_{single,sg}().
680
 
681
If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t
682
the following interfaces are provided:
683
 
684
        struct page *pci_dac_dma_to_page(struct pci_dev *pdev,
685
                                         dma64_addr_t dma_addr);
686
        unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev,
687
                                            dma64_addr_t dma_addr);
688
 
689
This is possible with the DAC interfaces purely because they are
690
not translated in any way.
691
 
692
                Optimizing Unmap State Space Consumption
693
 
694
On many platforms, pci_unmap_{single,page}() is simply a nop.
695
Therefore, keeping track of the mapping address and length is a waste
696
of space.  Instead of filling your drivers up with ifdefs and the like
697
to "work around" this (which would defeat the whole purpose of a
698
portable API) the following facilities are provided.
699
 
700
Actually, instead of describing the macros one by one, we'll
701
transform some example code.
702
 
703
1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures.
704
   Example, before:
705
 
706
        struct ring_state {
707
                struct sk_buff *skb;
708
                dma_addr_t mapping;
709
                __u32 len;
710
        };
711
 
712
   after:
713
 
714
        struct ring_state {
715
                struct sk_buff *skb;
716
                DECLARE_PCI_UNMAP_ADDR(mapping)
717
                DECLARE_PCI_UNMAP_LEN(len)
718
        };
719
 
720
   NOTE: DO NOT put a semicolon at the end of the DECLARE_*()
721
         macro.
722
 
723
2) Use pci_unmap_{addr,len}_set to set these values.
724
   Example, before:
725
 
726
        ringp->mapping = FOO;
727
        ringp->len = BAR;
728
 
729
   after:
730
 
731
        pci_unmap_addr_set(ringp, mapping, FOO);
732
        pci_unmap_len_set(ringp, len, BAR);
733
 
734
3) Use pci_unmap_{addr,len} to access these values.
735
   Example, before:
736
 
737
        pci_unmap_single(pdev, ringp->mapping, ringp->len,
738
                         PCI_DMA_FROMDEVICE);
739
 
740
   after:
741
 
742
        pci_unmap_single(pdev,
743
                         pci_unmap_addr(ringp, mapping),
744
                         pci_unmap_len(ringp, len),
745
                         PCI_DMA_FROMDEVICE);
746
 
747
It really should be self-explanatory.  We treat the ADDR and LEN
748
separately, because it is possible for an implementation to only
749
need the address in order to perform the unmap operation.
750
 
751
                        Platform Issues
752
 
753
If you are just writing drivers for Linux and do not maintain
754
an architecture port for the kernel, you can safely skip down
755
to "Closing".
756
 
757
1) Struct scatterlist requirements.
758
 
759
   Struct scatterlist must contain, at a minimum, the following
760
   members:
761
 
762
        char *address;
763
        struct page *page;
764
        unsigned int offset;
765
        unsigned int length;
766
 
767
   The "address" member will disappear in 2.5.x
768
 
769
   This means that your pci_{map,unmap}_sg() and all other
770
   interfaces dealing with scatterlists must be able to cope
771
   properly with page being non NULL.
772
 
773
   A scatterlist is in one of two states.  The base address is
774
   either specified by "address" or by a "page+offset" pair.
775
   If "address" is NULL, then "page+offset" is being used.
776
   If "page" is NULL, then "address" is being used.
777
 
778
   In 2.5.x, all scatterlists will use "page+offset".  But during
779
   2.4.x we still have to support the old method.
780
 
781
2) More to come...
782
 
783
                           Closing
784
 
785
This document, and the API itself, would not be in it's current
786
form without the feedback and suggestions from numerous individuals.
787
We would like to specifically mention, in no particular order, the
788
following people:
789
 
790
        Russell King 
791
        Leo Dagum 
792
        Ralf Baechle 
793
        Grant Grundler 
794
        Jay Estabrook 
795
        Thomas Sailer 
796
        Andrea Arcangeli 
797
        Jens Axboe 
798
        David Mosberger-Tang 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.