OpenCores
URL https://opencores.org/ocsvn/or1k/or1k/trunk

Subversion Repositories or1k

[/] [or1k/] [trunk/] [linux/] [linux-2.4/] [Documentation/] [DocBook/] [kernel-hacking.tmpl] - Blame information for rev 1765

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 1275 phoenix
2
 
3
4
 
5
  Unreliable Guide To Hacking The Linux Kernel
6
 
7
  
8
   
9
    Paul
10
    Rusty
11
    Russell
12
    
13
     
14
      rusty@rustcorp.com.au
15
     
16
    
17
   
18
  
19
 
20
  
21
   2001
22
   Rusty Russell
23
  
24
 
25
  
26
   
27
    This documentation is free software; you can redistribute
28
    it and/or modify it under the terms of the GNU General Public
29
    License as published by the Free Software Foundation; either
30
    version 2 of the License, or (at your option) any later
31
    version.
32
   
33
 
34
   
35
    This program is distributed in the hope that it will be
36
    useful, but WITHOUT ANY WARRANTY; without even the implied
37
    warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
38
    See the GNU General Public License for more details.
39
   
40
 
41
   
42
    You should have received a copy of the GNU General Public
43
    License along with this program; if not, write to the Free
44
    Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
45
    MA 02111-1307 USA
46
   
47
 
48
   
49
    For more details see the file COPYING in the source
50
    distribution of Linux.
51
   
52
  
53
 
54
  
55
   This is the first release of this document as part of the kernel tarball.
56
  
57
 
58
 
59
 
60
 
61
 
62
 
63
  Introduction
64
  
65
   Welcome, gentle reader, to Rusty's Unreliable Guide to Linux
66
   Kernel Hacking.  This document describes the common routines and
67
   general requirements for kernel code: its goal is to serve as a
68
   primer for Linux kernel development for experienced C
69
   programmers.  I avoid implementation details: that's what the
70
   code is for, and I ignore whole tracts of useful routines.
71
  
72
  
73
   Before you read this, please understand that I never wanted to
74
   write this document, being grossly under-qualified, but I always
75
   wanted to read it, and this was the only way.  I hope it will
76
   grow into a compendium of best practice, common starting points
77
   and random information.
78
  
79
 
80
 
81
 
82
  The Players
83
 
84
  
85
   At any time each of the CPUs in a system can be:
86
  
87
 
88
  
89
   
90
    
91
     not associated with any process, serving a hardware interrupt;
92
    
93
   
94
 
95
   
96
    
97
     not associated with any process, serving a softirq, tasklet or bh;
98
    
99
   
100
 
101
   
102
    
103
     running in kernel space, associated with a process;
104
    
105
   
106
 
107
   
108
    
109
     running a process in user space.
110
    
111
   
112
  
113
 
114
  
115
   There is a strict ordering between these: other than the last
116
   category (userspace) each can only be pre-empted by those above.
117
   For example, while a softirq is running on a CPU, no other
118
   softirq will pre-empt it, but a hardware interrupt can.  However,
119
   any other CPUs in the system execute independently.
120
  
121
 
122
  
123
   We'll see a number of ways that the user context can block
124
   interrupts, to become truly non-preemptable.
125
  
126
 
127
  
128
   User Context
129
 
130
   
131
    User context is when you are coming in from a system call or
132
    other trap: you can sleep, and you own the CPU (except for
133
    interrupts) until you call schedule().
134
    In other words, user context (unlike userspace) is not pre-emptable.
135
   
136
 
137
   
138
    
139
     You are always in user context on module load and unload,
140
     and on operations on the block device layer.
141
    
142
   
143
 
144
   
145
    In user context, the current pointer (indicating
146
    the task we are currently executing) is valid, and
147
    in_interrupt()
148
    (include/asm/hardirq.h) is false
149
    .
150
   
151
 
152
   
153
    
154
     Beware that if you have interrupts or bottom halves disabled
155
     (see below), in_interrupt() will return a
156
     false positive.
157
    
158
   
159
  
160
 
161
  
162
   Hardware Interrupts (Hard IRQs)
163
 
164
   
165
    Timer ticks, network cards and
166
    keyboard are examples of real
167
    hardware which produce interrupts at any time.  The kernel runs
168
    interrupt handlers, which services the hardware.  The kernel
169
    guarantees that this handler is never re-entered: if another
170
    interrupt arrives, it is queued (or dropped).  Because it
171
    disables interrupts, this handler has to be fast: frequently it
172
    simply acknowledges the interrupt, marks a `software interrupt'
173
    for execution and exits.
174
   
175
 
176
   
177
    You can tell you are in a hardware interrupt, because
178
    in_irq() returns true.
179
   
180
   
181
    
182
     Beware that this will return a false positive if interrupts are disabled
183
     (see below).
184
    
185
   
186
  
187
 
188
  
189
   Software Interrupt Context: Bottom Halves, Tasklets, softirqs
190
 
191
   
192
    Whenever a system call is about to return to userspace, or a
193
    hardware interrupt handler exits, any `software interrupts'
194
    which are marked pending (usually by hardware interrupts) are
195
    run (kernel/softirq.c).
196
   
197
 
198
   
199
    Much of the real interrupt handling work is done here.  Early in
200
    the transition to SMP, there were only `bottom
201
    halves' (BHs), which didn't take advantage of multiple CPUs.  Shortly
202
    after we switched from wind-up computers made of match-sticks and snot,
203
    we abandoned this limitation.
204
   
205
 
206
   
207
    include/linux/interrupt.h lists the
208
    different BH's.  No matter how many CPUs you have, no two BHs will run at
209
    the same time. This made the transition to SMP simpler, but sucks hard for
210
    scalable performance.  A very important bottom half is the timer
211
    BH (include/linux/timer.h): you
212
    can register to have it call functions for you in a given length of time.
213
   
214
 
215
   
216
    2.3.43 introduced softirqs, and re-implemented the (now
217
    deprecated) BHs underneath them.  Softirqs are fully-SMP
218
    versions of BHs: they can run on as many CPUs at once as
219
    required.  This means they need to deal with any races in shared
220
    data using their own locks.  A bitmask is used to keep track of
221
    which are enabled, so the 32 available softirqs should not be
222
    used up lightly.  (Yes, people will
223
    notice).
224
   
225
 
226
   
227
    tasklets (include/linux/interrupt.h)
228
    are like softirqs, except they are dynamically-registrable (meaning you
229
    can have as many as you want), and they also guarantee that any tasklet
230
    will only run on one CPU at any time, although different tasklets can
231
    run simultaneously (unlike different BHs).
232
   
233
   
234
    
235
     The name `tasklet' is misleading: they have nothing to do with `tasks',
236
     and probably more to do with some bad vodka Alexey Kuznetsov had at the
237
     time.
238
    
239
   
240
 
241
   
242
    You can tell you are in a softirq (or bottom half, or tasklet)
243
    using the in_softirq() macro
244
    (include/asm/softirq.h).
245
   
246
   
247
    
248
     Beware that this will return a false positive if a bh lock (see below)
249
     is held.
250
    
251
   
252
  
253
 
254
 
255
 
256
  Some Basic Rules
257
 
258
  
259
   
260
    No memory protection
261
    
262
     
263
      If you corrupt memory, whether in user context or
264
      interrupt context, the whole machine will crash.  Are you
265
      sure you can't do what you want in userspace?
266
     
267
    
268
   
269
 
270
   
271
    No floating point or MMX
272
    
273
     
274
      The FPU context is not saved; even in user
275
      context the FPU state probably won't
276
      correspond with the current process: you would mess with some
277
      user process' FPU state.  If you really want
278
      to do this, you would have to explicitly save/restore the full
279
      FPU state (and avoid context switches).  It
280
      is generally a bad idea; use fixed point arithmetic first.
281
     
282
    
283
   
284
 
285
   
286
    A rigid stack limit
287
    
288
     
289
      The kernel stack is about 6K in 2.2 (for most
290
      architectures: it's about 14K on the Alpha), and shared
291
      with interrupts so you can't use it all.  Avoid deep
292
      recursion and huge local arrays on the stack (allocate
293
      them dynamically instead).
294
     
295
    
296
   
297
 
298
   
299
    The Linux kernel is portable
300
    
301
     
302
      Let's keep it that way.  Your code should be 64-bit clean,
303
      and endian-independent.  You should also minimize CPU
304
      specific stuff, e.g. inline assembly should be cleanly
305
      encapsulated and minimized to ease porting.  Generally it
306
      should be restricted to the architecture-dependent part of
307
      the kernel tree.
308
     
309
    
310
   
311
  
312
 
313
 
314
 
315
  ioctls: Not writing a new system call
316
 
317
  
318
   A system call generally looks like this
319
  
320
 
321
  
322
asmlinkage int sys_mycall(int arg)
323
{
324
        return 0;
325
}
326
  
327
 
328
  
329
   First, in most cases you don't want to create a new system call.
330
   You create a character device and implement an appropriate ioctl
331
   for it.  This is much more flexible than system calls, doesn't have
332
   to be entered in every architecture's
333
   include/asm/unistd.h and
334
   arch/kernel/entry.S file, and is much more
335
   likely to be accepted by Linus.
336
  
337
 
338
  
339
   If all your routine does is read or write some parameter, consider
340
   implementing a sysctl interface instead.
341
  
342
 
343
  
344
   Inside the ioctl you're in user context to a process.  When a
345
   error occurs you return a negated errno (see
346
   include/linux/errno.h),
347
   otherwise you return 0.
348
  
349
 
350
  
351
   After you slept you should check if a signal occurred: the
352
   Unix/Linux way of handling signals is to temporarily exit the
353
   system call with the -ERESTARTSYS error.  The
354
   system call entry code will switch back to user context, process
355
   the signal handler and then your system call will be restarted
356
   (unless the user disabled that).  So you should be prepared to
357
   process the restart, e.g. if you're in the middle of manipulating
358
   some data structure.
359
  
360
 
361
  
362
if (signal_pending())
363
        return -ERESTARTSYS;
364
  
365
 
366
  
367
   If you're doing longer computations: first think userspace. If you
368
   really want to do it in kernel you should
369
   regularly check if you need to give up the CPU (remember there is
370
   cooperative multitasking per CPU).  Idiom:
371
  
372
 
373
  
374
if (current->need_resched)
375
        schedule(); /* Will sleep */
376
  
377
 
378
  
379
   A short note on interface design: the UNIX system call motto is
380
   "Provide mechanism not policy".
381
  
382
 
383
 
384
 
385
  Recipes for Deadlock
386
 
387
  
388
   You cannot call any routines which may sleep, unless:
389
  
390
  
391
   
392
    
393
     You are in user context.
394
    
395
   
396
 
397
   
398
    
399
     You do not own any spinlocks.
400
    
401
   
402
 
403
   
404
    
405
     You have interrupts enabled (actually, Andi Kleen says
406
     that the scheduling code will enable them for you, but
407
     that's probably not what you wanted).
408
    
409
   
410
  
411
 
412
  
413
   Note that some functions may sleep implicitly: common ones are
414
   the user space access functions (*_user) and memory allocation
415
   functions without GFP_ATOMIC.
416
  
417
 
418
  
419
   You will eventually lock up your box if you break these rules.
420
  
421
 
422
  
423
   Really.
424
  
425
 
426
 
427
 
428
  Common Routines
429
 
430
  
431
   </code></pre></td>
      </tr>
      <tr valign="middle">
         <td>432</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <function>printk()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>433</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/kernel.h</filename></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>434</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>   
435
 
436
   
437
    printk() feeds kernel messages to the
438
    console, dmesg, and the syslog daemon.  It is useful for debugging
439
    and reporting errors, and can be used inside interrupt context,
440
    but use with caution: a machine which has its console flooded with
441
    printk messages is unusable.  It uses a format string mostly
442
    compatible with ANSI C printf, and C string concatenation to give
443
    it a first "priority" argument:
444
   
445
 
446
   
447
printk(KERN_INFO "i = %u\n", i);
448
   
449
 
450
   
451
    See include/linux/kernel.h;
452
    for other KERN_ values; these are interpreted by syslog as the
453
    level.  Special case: for printing an IP address use
454
   
455
 
456
   
457
__u32 ipaddress;
458
printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
459
   
460
 
461
   
462
    printk() internally uses a 1K buffer and does
463
    not catch overruns.  Make sure that will be enough.
464
   
465
 
466
   
467
    
468
     You will know when you are a real kernel hacker
469
     when you start typoing printf as printk in your user programs :)
470
    
471
   
472
 
473
   
474
 
475
   
476
    
477
     Another sidenote: the original Unix Version 6 sources had a
478
     comment on top of its printf function: "Printf should not be
479
     used for chit-chat".  You should follow that advice.
480
    
481
   
482
  
483
 
484
  
485
   </code></pre></td>
      </tr>
      <tr valign="middle">
         <td>486</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <function>copy_[to/from]_user()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>487</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    /</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>488</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <function>get_user()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>489</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    /</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>490</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <function>put_user()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>491</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/asm/uaccess.h</filename></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>492</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>   
493
 
494
   
495
    [SLEEPS]
496
   
497
 
498
   
499
    put_user() and get_user()
500
    are used to get and put single values (such as an int, char, or
501
    long) from and to userspace.  A pointer into userspace should
502
    never be simply dereferenced: data should be copied using these
503
    routines.  Both return -EFAULT or 0.
504
   
505
   
506
    copy_to_user() and
507
    copy_from_user() are more general: they copy
508
    an arbitrary amount of data to and from userspace.
509
    
510
     
511
      Unlike put_user() and
512
      get_user(), they return the amount of
513
      uncopied data (ie. 0 still means
514
      success).
515
     
516
    
517
    [Yes, this moronic interface makes me cringe.  Please submit a
518
    patch and become my hero --RR.]
519
   
520
   
521
    The functions may sleep implicitly. This should never be called
522
    outside user context (it makes no sense), with interrupts
523
    disabled, or a spinlock held.
524
   
525
  
526
 
527
  
528
   <function>kmalloc()</function>/<function>kfree()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>529</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/slab.h</filename>
530
 
531
   
532
    [MAY SLEEP: SEE BELOW]
533
   
534
 
535
   
536
    These routines are used to dynamically request pointer-aligned
537
    chunks of memory, like malloc and free do in userspace, but
538
    kmalloc() takes an extra flag word.
539
    Important values:
540
   
541
 
542
   
543
    
544
     
545
      
546
       GFP_KERNEL
547
      
548
     
549
     
550
      
551
       May sleep and swap to free memory. Only allowed in user
552
       context, but is the most reliable way to allocate memory.
553
      
554
     
555
    
556
 
557
    
558
     
559
      
560
       GFP_ATOMIC
561
      
562
     
563
     
564
      
565
       Don't sleep. Less reliable than GFP_KERNEL,
566
       but may be called from interrupt context. You should
567
       really have a good out-of-memory
568
       error-handling strategy.
569
      
570
     
571
    
572
 
573
    
574
     
575
      
576
       GFP_DMA
577
      
578
     
579
     
580
      
581
       Allocate ISA DMA lower than 16MB. If you don't know what that
582
       is you don't need it.  Very unreliable.
583
      
584
     
585
    
586
   
587
 
588
   
589
    If you see a kmem_grow: Called nonatomically from int
590
     warning message you called a memory allocation function
591
    from interrupt context without GFP_ATOMIC.
592
    You should really fix that.  Run, don't walk.
593
   
594
 
595
   
596
    If you are allocating at least PAGE_SIZE
597
    (include/asm/page.h) bytes,
598
    consider using __get_free_pages()
599
 
600
    (include/linux/mm.h).  It
601
    takes an order argument (0 for page sized, 1 for double page, 2
602
    for four pages etc.) and the same memory priority flag word as
603
    above.
604
   
605
 
606
   
607
    If you are allocating more than a page worth of bytes you can use
608
    vmalloc().  It'll allocate virtual memory in
609
    the kernel map.  This block is not contiguous in physical memory,
610
    but the MMU makes it look like it is for you
611
    (so it'll only look contiguous to the CPUs, not to external device
612
    drivers).  If you really need large physically contiguous memory
613
    for some weird device, you have a problem: it is poorly supported
614
    in Linux because after some time memory fragmentation in a running
615
    kernel makes it hard.  The best way is to allocate the block early
616
    in the boot process via the alloc_bootmem()
617
    routine.
618
   
619
 
620
   
621
    Before inventing your own cache of often-used objects consider
622
    using a slab cache in
623
    include/linux/slab.h
624
   
625
  
626
 
627
  
628
   <function>current</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>629</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/asm/current.h</filename>
630
 
631
   
632
    This global variable (really a macro) contains a pointer to
633
    the current task structure, so is only valid in user context.
634
    For example, when a process makes a system call, this will
635
    point to the task structure of the calling process.  It is
636
    not NULL in interrupt context.
637
   
638
  
639
 
640
  
641
   <function>udelay()</function>/<function>mdelay()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>642</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>     <filename class=headerfile>include/asm/delay.h</filename></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>643</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>     <filename class=headerfile>include/linux/delay.h</filename></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>644</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>   
645
 
646
   
647
    The udelay() function can be used for small pauses.
648
    Do not use large values with udelay() as you risk
649
    overflow - the helper function mdelay() is useful
650
    here, or even consider schedule_timeout().
651
   
652
  
653
 
654
  
655
   <function>cpu_to_be32()</function>/<function>be32_to_cpu()</function>/<function>cpu_to_le32()</function>/<function>le32_to_cpu()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>656</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>     <filename class=headerfile>include/asm/byteorder.h</filename></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>657</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>   
658
 
659
   
660
    The cpu_to_be32() family (where the "32" can
661
    be replaced by 64 or 16, and the "be" can be replaced by "le") are
662
    the general way to do endian conversions in the kernel: they
663
    return the converted value.  All variations supply the reverse as
664
    well: be32_to_cpu(), etc.
665
   
666
 
667
   
668
    There are two major variations of these functions: the pointer
669
    variation, such as cpu_to_be32p(), which take
670
    a pointer to the given type, and return the converted value.  The
671
    other variation is the "in-situ" family, such as
672
    cpu_to_be32s(), which convert value referred
673
    to by the pointer, and return void.
674
   
675
  
676
 
677
  
678
   <function>local_irq_save()</function>/<function>local_irq_restore()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>679</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/asm/system.h</filename></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>680</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>   
681
 
682
   
683
    These routines disable hard interrupts on the local CPU, and
684
    restore them.  They are reentrant; saving the previous state in
685
    their one unsigned long flags argument.  If you
686
    know that interrupts are enabled, you can simply use
687
    local_irq_disable() and
688
    local_irq_enable().
689
   
690
  
691
 
692
  
693
   <function>local_bh_disable()</function>/<function>local_bh_enable()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>694</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/asm/softirq.h</filename>
695
 
696
   
697
    These routines disable soft interrupts on the local CPU, and
698
    restore them.  They are reentrant; if soft interrupts were
699
    disabled before, they will still be disabled after this pair
700
    of functions has been called.  They prevent softirqs, tasklets
701
    and bottom halves from running on the current CPU.
702
   
703
  
704
 
705
  
706
   <function>smp_processor_id</function>()/<function>cpu_[number/logical]_map()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>707</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/asm/smp.h</filename>
708
 
709
   
710
    smp_processor_id() returns the current
711
    processor number, between 0 and NR_CPUS (the
712
    maximum number of CPUs supported by Linux, currently 32).  These
713
    values are not necessarily continuous: to get a number between 0
714
    and smp_num_cpus() (the number of actual
715
    processors in this machine), the
716
    cpu_number_map() function is used to map the
717
    processor id to a logical number.
718
    cpu_logical_map() does the reverse.
719
   
720
  
721
 
722
  
723
   <type>__init</type>/<type>__exit</type>/<type>__initdata</type></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>724</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/init.h</filename>
725
 
726
   
727
    After boot, the kernel frees up a special section; functions
728
    marked with __init and data structures marked with
729
    __initdata are dropped after boot is complete (within
730
    modules this directive is currently ignored).  __exit
731
    is used to declare a function which is only required on exit: the
732
    function will be dropped if this file is not compiled as a module.
733
    See the header file for use. Note that it makes no sense for a function
734
    marked with __init to be exported to modules with
735
    EXPORT_SYMBOL() - this will break.
736
   
737
   
738
   Static data structures marked as __initdata must be initialised
739
   (as opposed to ordinary static data which is zeroed BSS) and cannot be
740
   const.
741
   
742
 
743
  
744
 
745
  
746
   <function>__initcall()</function>/<function>module_init()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>747</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/init.h</filename>
748
   
749
    Many parts of the kernel are well served as a module
750
    (dynamically-loadable parts of the kernel).  Using the
751
    module_init() and
752
    module_exit() macros it is easy to write code
753
    without #ifdefs which can operate both as a module or built into
754
    the kernel.
755
   
756
 
757
   
758
    The module_init() macro defines which
759
    function is to be called at module insertion time (if the file is
760
    compiled as a module), or at boot time: if the file is not
761
    compiled as a module the module_init() macro
762
    becomes equivalent to __initcall(), which
763
    through linker magic ensures that the function is called on boot.
764
   
765
 
766
   
767
    The function can return a negative error number to cause
768
    module loading to fail (unfortunately, this has no effect if
769
    the module is compiled into the kernel).  For modules, this is
770
    called in user context, with interrupts enabled, and the
771
    kernel lock held, so it can sleep.
772
   
773
  
774
 
775
  
776
    <function>module_exit()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>777</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/init.h</filename> 
778
 
779
   
780
    This macro defines the function to be called at module removal
781
    time (or never, in the case of the file compiled into the
782
    kernel).  It will only be called if the module usage count has
783
    reached zero.  This function can also sleep, but cannot fail:
784
    everything must be cleaned up by the time it returns.
785
   
786
  
787
 
788
  
789
    <function>MOD_INC_USE_COUNT</function>/<function>MOD_DEC_USE_COUNT</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>790</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/module.h</filename>
791
 
792
   
793
    These manipulate the module usage count, to protect against
794
    removal (a module also can't be removed if another module uses
795
    one of its exported symbols: see below).  Every reference to
796
    the module from user context should be reflected by this
797
    counter (e.g. for every data structure or socket) before the
798
    function sleeps.  To quote Tim Waugh:
799
   
800
 
801
   
802
/* THIS IS BAD */
803
foo_open (...)
804
{
805
        stuff..
806
        if (fail)
807
                return -EBUSY;
808
        sleep.. (might get unloaded here)
809
        stuff..
810
        MOD_INC_USE_COUNT;
811
        return 0;
812
}
813
 
814
/* THIS IS GOOD /
815
foo_open (...)
816
{
817
        MOD_INC_USE_COUNT;
818
        stuff..
819
        if (fail) {
820
                MOD_DEC_USE_COUNT;
821
                return -EBUSY;
822
        }
823
        sleep.. (safe now)
824
        stuff..
825
        return 0;
826
}
827
   
828
 
829
   
830
   You can often avoid having to deal with these problems by using the
831
   owner field of the
832
   file_operations structure. Set this field
833
   as the macro THIS_MODULE.
834
   
835
 
836
   
837
   For more complicated module unload locking requirements, you can set the
838
   can_unload function pointer to your own routine,
839
   which should return 0 if the module is
840
   unloadable, or -EBUSY otherwise.
841
   
842
 
843
  
844
 
845
 
846
 
847
  Wait Queues</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>848</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>   <filename class=headerfile>include/linux/wait.h</filename></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>849</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>  
850
  
851
   [SLEEPS]
852
  
853
 
854
  
855
   A wait queue is used to wait for someone to wake you up when a
856
   certain condition is true.  They must be used carefully to ensure
857
   there is no race condition.  You declare a
858
   wait_queue_head_t, and then processes which want to
859
   wait for that condition declare a wait_queue_t
860
   referring to themselves, and place that in the queue.
861
  
862
 
863
  
864
   Declaring
865
 
866
   
867
    You declare a wait_queue_head_t using the
868
    DECLARE_WAIT_QUEUE_HEAD() macro, or using the
869
    init_waitqueue_head() routine in your
870
    initialization code.
871
   
872
  
873
 
874
  
875
   Queuing
876
 
877
   
878
    Placing yourself in the waitqueue is fairly complex, because you
879
    must put yourself in the queue before checking the condition.
880
    There is a macro to do this:
881
    wait_event_interruptible()
882
 
883
    include/linux/sched.h The
884
    first argument is the wait queue head, and the second is an
885
    expression which is evaluated; the macro returns
886
    0 when this expression is true, or
887
    -ERESTARTSYS if a signal is received.
888
    The wait_event() version ignores signals.
889
   
890
   
891
   Do not use the sleep_on() function family -
892
   it is very easy to accidentally introduce races; almost certainly
893
   one of the wait_event() family will do, or a
894
   loop around schedule_timeout(). If you choose
895
   to loop around schedule_timeout() remember
896
   you must set the task state (with
897
   set_current_state()) on each iteration to avoid
898
   busy-looping.
899
   
900
 
901
  
902
 
903
  
904
   Waking Up Queued Tasks
905
 
906
   
907
    Call wake_up()
908
 
909
    include/linux/sched.h;,
910
    which will wake up every process in the queue.  The exception is
911
    if one has TASK_EXCLUSIVE set, in which case
912
    the remainder of the queue will not be woken.
913
   
914
  
915
 
916
 
917
 
918
  Atomic Operations
919
 
920
  
921
   Certain operations are guaranteed atomic on all platforms.  The
922
   first class of operations work on atomic_t
923
 
924
   include/asm/atomic.h; this
925
   contains a signed integer (at least 24 bits long), and you must use
926
   these functions to manipulate or read atomic_t variables.
927
   atomic_read() and
928
   atomic_set() get and set the counter,
929
   atomic_add(),
930
   atomic_sub(),
931
   atomic_inc(),
932
   atomic_dec(), and
933
   atomic_dec_and_test() (returns
934
   true if it was decremented to zero).
935
  
936
 
937
  
938
   Yes.  It returns true (i.e. != 0) if the
939
   atomic variable is zero.
940
  
941
 
942
  
943
   Note that these functions are slower than normal arithmetic, and
944
   so should not be used unnecessarily.  On some platforms they
945
   are much slower, like 32-bit Sparc where they use a spinlock.
946
  
947
 
948
  
949
   The second class of atomic operations is atomic bit operations on a
950
   long, defined in
951
 
952
   include/asm/bitops.h.  These
953
   operations generally take a pointer to the bit pattern, and a bit
954
   number: 0 is the least significant bit.
955
   set_bit(), clear_bit()
956
   and change_bit() set, clear, and flip the
957
   given bit.  test_and_set_bit(),
958
   test_and_clear_bit() and
959
   test_and_change_bit() do the same thing,
960
   except return true if the bit was previously set; these are
961
   particularly useful for very simple locking.
962
  
963
 
964
  
965
   It is possible to call these operations with bit indices greater
966
   than BITS_PER_LONG.  The resulting behavior is strange on big-endian
967
   platforms though so it is a good idea not to do this.
968
  
969
 
970
  
971
   Note that the order of bits depends on the architecture, and in
972
   particular, the bitfield passed to these operations must be at
973
   least as large as a long.
974
  
975
 
976
 
977
 
978
  Symbols
979
 
980
  
981
   Within the kernel proper, the normal linking rules apply
982
   (ie. unless a symbol is declared to be file scope with the
983
   static keyword, it can be used anywhere in the
984
   kernel).  However, for modules, a special exported symbol table is
985
   kept which limits the entry points to the kernel proper.  Modules
986
   can also export symbols.
987
  
988
 
989
  
990
   <function>EXPORT_SYMBOL()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>991</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/module.h</filename>
992
 
993
   
994
    This is the classic method of exporting a symbol, and it works
995
    for both modules and non-modules.  In the kernel all these
996
    declarations are often bundled into a single file to help
997
    genksyms (which searches source files for these declarations).
998
    See the comment on genksyms and Makefiles below.
999
   
1000
  
1001
 
1002
  
1003
   <symbol>EXPORT_NO_SYMBOLS</symbol></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>1004</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/module.h</filename>
1005
 
1006
   
1007
    If a module exports no symbols then you can specify
1008
    
1009
EXPORT_NO_SYMBOLS;
1010
    
1011
    anywhere in the module.
1012
    In kernel 2.4 and earlier, if a module contains neither
1013
    EXPORT_SYMBOL() nor
1014
    EXPORT_NO_SYMBOLS then the module defaults to
1015
    exporting all non-static global symbols.
1016
    In kernel 2.5 onwards you must explicitly specify whether a module
1017
    exports symbols or not.
1018
   
1019
  
1020
 
1021
  
1022
   <function>EXPORT_SYMBOL_GPL()</function></code></pre></td>
      </tr>
      <tr valign="middle">
         <td>1023</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/module.h</filename>
1024
 
1025
   
1026
    Similar to EXPORT_SYMBOL() except that the
1027
    symbols exported by EXPORT_SYMBOL_GPL() can
1028
    only be seen by modules with a
1029
    MODULE_LICENSE() that specifies a GPL
1030
    compatible license.
1031
   
1032
  
1033
 
1034
 
1035
 
1036
  Routines and Conventions
1037
 
1038
  
1039
   Double-linked lists</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>1040</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    <filename class=headerfile>include/linux/list.h</filename>
1041
 
1042
   
1043
    There are three sets of linked-list routines in the kernel
1044
    headers, but this one seems to be winning out (and Linus has
1045
    used it).  If you don't have some particular pressing need for
1046
    a single list, it's a good choice.  In fact, I don't care
1047
    whether it's a good choice or not, just use it so we can get
1048
    rid of the others.
1049
   
1050
  
1051
 
1052
  
1053
   Return Conventions
1054
 
1055
   
1056
    For code called in user context, it's very common to defy C
1057
    convention, and return 0 for success,
1058
    and a negative error number
1059
    (eg. -EFAULT) for failure.  This can be
1060
    unintuitive at first, but it's fairly widespread in the networking
1061
    code, for example.
1062
   
1063
 
1064
   
1065
    The filesystem code uses ERR_PTR()
1066
 
1067
    include/linux/fs.h; to
1068
    encode a negative error number into a pointer, and
1069
    IS_ERR() and PTR_ERR()
1070
    to get it back out again: avoids a separate pointer parameter for
1071
    the error number.  Icky, but in a good way.
1072
   
1073
  
1074
 
1075
  
1076
   Breaking Compilation
1077
 
1078
   
1079
    Linus and the other developers sometimes change function or
1080
    structure names in development kernels; this is not done just to
1081
    keep everyone on their toes: it reflects a fundamental change
1082
    (eg. can no longer be called with interrupts on, or does extra
1083
    checks, or doesn't do checks which were caught before).  Usually
1084
    this is accompanied by a fairly complete note to the linux-kernel
1085
    mailing list; search the archive.  Simply doing a global replace
1086
    on the file usually makes things worse.
1087
   
1088
  
1089
 
1090
  
1091
   Initializing structure members
1092
 
1093
   
1094
    The preferred method of initializing structures is to use
1095
    designated initialisers, as defined by ISO C99, eg:
1096
   
1097
   
1098
static struct block_device_operations opt_fops = {
1099
        .open               = opt_open,
1100
        .release            = opt_release,
1101
        .ioctl              = opt_ioctl,
1102
        .check_media_change = opt_media_change,
1103
};
1104
   
1105
   
1106
    This makes it easy to grep for, and makes it clear which
1107
    structure fields are set.  You should do this because it looks
1108
    cool.
1109
   
1110
  
1111
 
1112
  
1113
   GNU Extensions
1114
 
1115
   
1116
    GNU Extensions are explicitly allowed in the Linux kernel.
1117
    Note that some of the more complex ones are not very well
1118
    supported, due to lack of general use, but the following are
1119
    considered standard (see the GCC info page section "C
1120
    Extensions" for more details - Yes, really the info page, the
1121
    man page is only a short summary of the stuff in info):
1122
   
1123
   
1124
    
1125
     
1126
      Inline functions
1127
     
1128
    
1129
    
1130
     
1131
      Statement expressions (ie. the ({ and }) constructs).
1132
     
1133
    
1134
    
1135
     
1136
      Declaring attributes of a function / variable / type
1137
      (__attribute__)
1138
     
1139
    
1140
    
1141
     
1142
      typeof
1143
     
1144
    
1145
    
1146
     
1147
      Zero length arrays
1148
     
1149
    
1150
    
1151
     
1152
      Macro varargs
1153
     
1154
    
1155
    
1156
     
1157
      Arithmetic on void pointers
1158
     
1159
    
1160
    
1161
     
1162
      Non-Constant initializers
1163
     
1164
    
1165
    
1166
     
1167
      Assembler Instructions (not outside arch/ and include/asm/)
1168
     
1169
    
1170
    
1171
     
1172
      Function names as strings (__FUNCTION__)
1173
     
1174
    
1175
    
1176
     
1177
      __builtin_constant_p()
1178
     
1179
    
1180
   
1181
 
1182
   
1183
    Be wary when using long long in the kernel, the code gcc generates for
1184
    it is horrible and worse: division and multiplication does not work
1185
    on i386 because the GCC runtime functions for it are missing from
1186
    the kernel environment.
1187
   
1188
 
1189
    
1190
  
1191
 
1192
  
1193
   C++
1194
 
1195
   
1196
    Using C++ in the kernel is usually a bad idea, because the
1197
    kernel does not provide the necessary runtime environment
1198
    and the include files are not tested for it.  It is still
1199
    possible, but not recommended.  If you really want to do
1200
    this, forget about exceptions at least.
1201
   
1202
  
1203
 
1204
  
1205
   #if
1206
 
1207
   
1208
    It is generally considered cleaner to use macros in header files
1209
    (or at the top of .c files) to abstract away functions rather than
1210
    using `#if' pre-processor statements throughout the source code.
1211
   
1212
  
1213
 
1214
 
1215
 
1216
  Putting Your Stuff in the Kernel
1217
 
1218
  
1219
   In order to get your stuff into shape for official inclusion, or
1220
   even to make a neat patch, there's administrative work to be
1221
   done:
1222
  
1223
  
1224
   
1225
    
1226
     Figure out whose pond you've been pissing in.  Look at the top of
1227
     the source files, inside the MAINTAINERS
1228
     file, and last of all in the CREDITS file.
1229
     You should coordinate with this person to make sure you're not
1230
     duplicating effort, or trying something that's already been
1231
     rejected.
1232
    
1233
 
1234
    
1235
     Make sure you put your name and EMail address at the top of
1236
     any files you create or mangle significantly.  This is the
1237
     first place people will look when they find a bug, or when
1238
     they want to make a change.
1239
    
1240
   
1241
 
1242
   
1243
    
1244
     Usually you want a configuration option for your kernel hack.
1245
     Edit Config.in in the appropriate directory
1246
     (but under arch/ it's called
1247
     config.in).  The Config Language used is not
1248
     bash, even though it looks like bash; the safe way is to use only
1249
     the constructs that you already see in
1250
     Config.in files (see
1251
     Documentation/kbuild/config-language.txt).
1252
     It's good to run "make xconfig" at least once to test (because
1253
     it's the only one with a static parser).
1254
    
1255
 
1256
    
1257
     Variables which can be Y or N use bool followed by a
1258
     tagline and the config define name (which must start with
1259
     CONFIG_).  The tristate function is the same, but
1260
     allows the answer M (which defines
1261
     CONFIG_foo_MODULE in your source, instead of
1262
     CONFIG_FOO) if CONFIG_MODULES
1263
     is enabled.
1264
    
1265
 
1266
    
1267
     You may well want to make your CONFIG option only visible if
1268
     CONFIG_EXPERIMENTAL is enabled: this serves as a
1269
     warning to users.  There many other fancy things you can do: see
1270
     the various Config.in files for ideas.
1271
    
1272
   
1273
 
1274
   
1275
    
1276
     Edit the Makefile: the CONFIG variables are
1277
     exported here so you can conditionalize compilation with `ifeq'.
1278
     If your file exports symbols then add the names to
1279
     export-objs so that genksyms will find them.
1280
     
1281
      
1282
       There is a restriction on the kernel build system that objects
1283
       which export symbols must have globally unique names.
1284
       If your object does not have a globally unique name then the
1285
       standard fix is to move the
1286
       EXPORT_SYMBOL() statements to their own
1287
       object with a unique name.
1288
       This is why several systems have separate exporting objects,
1289
       usually suffixed with ksyms.
1290
      
1291
     
1292
    
1293
   
1294
 
1295
   
1296
    
1297
     Document your option in Documentation/Configure.help.  Mention
1298
     incompatibilities and issues here.   Definitely
1299
      end your description with  if in doubt, say N
1300
      (or, occasionally, `Y'); this is for people who have no
1301
     idea what you are talking about.
1302
    
1303
   
1304
 
1305
   
1306
    
1307
     Put yourself in CREDITS if you've done
1308
     something noteworthy, usually beyond a single file (your name
1309
     should be at the top of the source files anyway).
1310
     MAINTAINERS means you want to be consulted
1311
     when changes are made to a subsystem, and hear about bugs; it
1312
     implies a more-than-passing commitment to some part of the code.
1313
    
1314
   
1315
 
1316
   
1317
    
1318
     Finally, don't forget to read Documentation/SubmittingPatches
1319
     and possibly Documentation/SubmittingDrivers.
1320
    
1321
   
1322
  
1323
 
1324
 
1325
 
1326
  Kernel Cantrips
1327
 
1328
  
1329
   Some favorites from browsing the source.  Feel free to add to this
1330
   list.
1331
  
1332
 
1333
  
1334
   include/linux/brlock.h:
1335
  
1336
  
1337
extern inline void br_read_lock (enum brlock_indices idx)
1338
{
1339
        /*
1340
         * This causes a link-time bug message if an
1341
         * invalid index is used:
1342
         */
1343
        if (idx >= __BR_END)
1344
                __br_lock_usage_bug();
1345
 
1346
        read_lock(&__brlock_array[smp_processor_id()][idx]);
1347
}
1348
  
1349
 
1350
  
1351
   include/linux/fs.h:
1352
  
1353
  
1354
/*
1355
 * Kernel pointers have redundant information, so we can use a
1356
 * scheme where we can return either an error code or a dentry
1357
 * pointer with the same return value.
1358
 *
1359
 * This should be a per-architecture thing, to allow different
1360
 * error and pointer decisions.
1361
 */
1362
 #define ERR_PTR(err)    ((void *)((long)(err)))
1363
 #define PTR_ERR(ptr)    ((long)(ptr))
1364
 #define IS_ERR(ptr)     ((unsigned long)(ptr) > (unsigned long)(-1000))
1365
1366
 
1367
  
1368
   include/asm-i386/uaccess.h:
1369
  
1370
 
1371
  
1372
#define copy_to_user(to,from,n)                         \
1373
        (__builtin_constant_p(n) ?                      \
1374
         __constant_copy_to_user((to),(from),(n)) :     \
1375
         __generic_copy_to_user((to),(from),(n)))
1376
  
1377
 
1378
  
1379
   arch/sparc/kernel/head.S:
1380
  
1381
 
1382
  
1383
/*
1384
 * Sun people can't spell worth damn. "compatability" indeed.
1385
 * At least we *know* we can't spell, and use a spell-checker.
1386
 */
1387
 
1388
/* Uh, actually Linus it is I who cannot spell. Too much murky
1389
 * Sparc assembly will do this to ya.
1390
 */
1391
C_LABEL(cputypvar):
1392
        .asciz "compatability"
1393
 
1394
/* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */
1395
        .align 4
1396
C_LABEL(cputypvar_sun4m):
1397
        .asciz "compatible"
1398
  
1399
 
1400
  
1401
   arch/sparc/lib/checksum.S:
1402
  
1403
 
1404
  
1405
        /* Sun, you just can't beat me, you just can't.  Stop trying,
1406
         * give up.  I'm serious, I am going to kick the living shit
1407
         * out of you, game over, lights out.
1408
         */
1409
  
1410
 
1411
 
1412
 
1413
  Thanks
1414
 
1415
  
1416
   Thanks to Andi Kleen for the idea, answering my questions, fixing
1417
   my mistakes, filling content, etc.  Philipp Rumpf for more spelling
1418
   and clarity fixes, and some excellent non-obvious points.  Werner
1419
   Almesberger for giving me a great summary of
1420
   disable_irq(), and Jes Sorensen and Andrea
1421
   Arcangeli added caveats. Michael Elizabeth Chastain for checking
1422
   and adding to the Configure section.  Telsa Gwynne for teaching me DocBook.
1424
  
1425
 
1426
1427
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.