OpenCores
URL https://opencores.org/ocsvn/or1k/or1k/trunk

Subversion Repositories or1k

[/] [or1k/] [trunk/] [linux/] [linux-2.4/] [Documentation/] [DocBook/] [kernel-locking.tmpl] - Blame information for rev 1765

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 1275 phoenix
2
 
3
4
 
5
  Unreliable Guide To Locking
6
 
7
  
8
   
9
    Paul
10
    Rusty
11
    Russell
12
    
13
     
14
      rusty@rustcorp.com.au
15
     
16
    
17
   
18
  
19
 
20
  
21
   2000
22
   Paul Russell
23
  
24
 
25
  
26
   
27
     This documentation is free software; you can redistribute
28
     it and/or modify it under the terms of the GNU General Public
29
     License as published by the Free Software Foundation; either
30
     version 2 of the License, or (at your option) any later
31
     version.
32
   
33
 
34
   
35
     This program is distributed in the hope that it will be
36
     useful, but WITHOUT ANY WARRANTY; without even the implied
37
     warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
38
     See the GNU General Public License for more details.
39
   
40
 
41
   
42
     You should have received a copy of the GNU General Public
43
     License along with this program; if not, write to the Free
44
     Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
45
     MA 02111-1307 USA
46
   
47
 
48
   
49
     For more details see the file COPYING in the source
50
     distribution of Linux.
51
   
52
  
53
 
54
 
55
 
56
  
57
   Introduction
58
   
59
     Welcome, to Rusty's Remarkably Unreliable Guide to Kernel
60
     Locking issues.  This document describes the locking systems in
61
     the Linux Kernel as we approach 2.4.
62
   
63
   
64
     It looks like SMP
65
      is here to stay; so everyone hacking on the kernel
66
     these days needs to know the fundamentals of concurrency and locking
67
     for SMP.
68
   
69
 
70
   
71
    The Problem With Concurrency
72
    
73
      (Skip this if you know what a Race Condition is).
74
    
75
    
76
      In a normal program, you can increment a counter like so:
77
    
78
    
79
      very_important_count++;
80
    
81
 
82
    
83
      This is what they would expect to happen:
84
    
85
 
86
    
87
     Expected Results
88
 
89
     
90
 
91
      
92
       
93
        Instance 1
94
        Instance 2
95
       
96
      
97
 
98
      
99
       
100
        read very_important_count (5)
101
        
102
       
103
       
104
        add 1 (6)
105
        
106
       
107
       
108
        write very_important_count (6)
109
        
110
       
111
       
112
        
113
        read very_important_count (6)
114
       
115
       
116
        
117
        add 1 (7)
118
       
119
       
120
        
121
        write very_important_count (7)
122
       
123
      
124
 
125
     
126
    
127
 
128
    
129
     This is what might happen:
130
    
131
 
132
    
133
     Possible Results
134
 
135
     
136
      
137
       
138
        Instance 1
139
        Instance 2
140
       
141
      
142
 
143
      
144
       
145
        read very_important_count (5)
146
        
147
       
148
       
149
        
150
        read very_important_count (5)
151
       
152
       
153
        add 1 (6)
154
        
155
       
156
       
157
        
158
        add 1 (6)
159
       
160
       
161
        write very_important_count (6)
162
        
163
       
164
       
165
        
166
        write very_important_count (6)
167
       
168
      
169
     
170
    
171
 
172
    
173
      This overlap, where what actually happens depends on the
174
      relative timing of multiple tasks, is called a race condition.
175
      The piece of code containing the concurrency issue is called a
176
      critical region.  And especially since Linux starting running
177
      on SMP machines, they became one of the major issues in kernel
178
      design and implementation.
179
    
180
    
181
      The solution is to recognize when these simultaneous accesses
182
      occur, and use locks to make sure that only one instance can
183
      enter the critical region at any time.  There are many
184
      friendly primitives in the Linux kernel to help you do this.
185
      And then there are the unfriendly primitives, but I'll pretend
186
      they don't exist.
187
    
188
   
189
  
190
 
191
  
192
   Two Main Types of Kernel Locks: Spinlocks and Semaphores
193
 
194
   
195
     There are two main types of kernel locks.  The fundamental type
196
     is the spinlock
197
     (include/asm/spinlock.h),
198
     which is a very simple single-holder lock: if you can't get the
199
     spinlock, you keep trying (spinning) until you can.  Spinlocks are
200
     very small and fast, and can be used anywhere.
201
   
202
   
203
     The second type is a semaphore
204
     (include/asm/semaphore.h): it
205
     can have more than one holder at any time (the number decided at
206
     initialization time), although it is most commonly used as a
207
     single-holder lock (a mutex).  If you can't get a semaphore,
208
     your task will put itself on the queue, and be woken up when the
209
     semaphore is released.  This means the CPU will do something
210
     else while you are waiting, but there are many cases when you
211
     simply can't sleep (see ), and so
212
     have to use a spinlock instead.
213
   
214
   
215
     Neither type of lock is recursive: see
216
     .
217
   
218
 
219
   
220
    Locks and Uniprocessor Kernels
221
 
222
    
223
      For kernels compiled without CONFIG_SMP, spinlocks
224
      do not exist at all.  This is an excellent design decision: when
225
      no-one else can run at the same time, there is no reason to
226
      have a lock at all.
227
    
228
 
229
    
230
      You should always test your locking code with CONFIG_SMP
231
      enabled, even if you don't have an SMP test box, because it
232
      will still catch some (simple) kinds of deadlock.
233
    
234
 
235
    
236
      Semaphores still exist, because they are required for
237
      synchronization between user
238
      contexts, as we will see below.
239
    
240
   
241
 
242
   
243
    Read/Write Lock Variants
244
 
245
    
246
      Both spinlocks and semaphores have read/write variants:
247
      rwlock_t and struct rw_semaphore.
248
      These divide users into two classes: the readers and the writers.  If
249
      you are only reading the data, you can get a read lock, but to write to
250
      the data you need the write lock.  Many people can hold a read lock,
251
      but a writer must be sole holder.
252
    
253
 
254
    
255
      This means much smoother locking if your code divides up
256
      neatly along reader/writer lines.  All the discussions below
257
      also apply to read/write variants.
258
    
259
   
260
 
261
    
262
     Locking Only In User Context
263
 
264
     
265
       If you have a data structure which is only ever accessed from
266
       user context, then you can use a simple semaphore
267
       (linux/asm/semaphore.h) to protect it.  This
268
       is the most trivial case: you initialize the semaphore to the number
269
       of resources available (usually 1), and call
270
       down_interruptible() to grab the semaphore, and
271
       up() to release it.  There is also a
272
       down(), which should be avoided, because it
273
       will not return if a signal is received.
274
     
275
 
276
     
277
       Example: linux/net/core/netfilter.c allows
278
       registration of new setsockopt() and
279
       getsockopt() calls, with
280
       nf_register_sockopt().  Registration and
281
       de-registration are only done on module load and unload (and boot
282
       time, where there is no concurrency), and the list of registrations
283
       is only consulted for an unknown setsockopt()
284
       or getsockopt() system call.  The
285
       nf_sockopt_mutex is perfect to protect this,
286
       especially since the setsockopt and getsockopt calls may well
287
       sleep.
288
     
289
   
290
 
291
   
292
    Locking Between User Context and BHs
293
 
294
    
295
      If a bottom half shares
296
      data with user context, you have two problems.  Firstly, the current
297
      user context can be interrupted by a bottom half, and secondly, the
298
      critical region could be entered from another CPU.  This is where
299
      spin_lock_bh()
300
      (include/linux/spinlock.h) is
301
      used.  It disables bottom halves on that CPU, then grabs the lock.
302
      spin_unlock_bh() does the reverse.
303
    
304
 
305
    
306
      This works perfectly for UP
307
       as well: the spin lock vanishes, and this macro
308
      simply becomes local_bh_disable()
309
      (include/asm/softirq.h), which
310
      protects you from the bottom half being run.
311
    
312
   
313
 
314
   
315
    Locking Between User Context and Tasklets/Soft IRQs
316
 
317
    
318
      This is exactly the same as above, because
319
      local_bh_disable() actually disables all
320
      softirqs and tasklets
321
      on that CPU as well.  It should really be
322
      called `local_softirq_disable()', but the name has been preserved
323
      for historical reasons.  Similarly,
324
      spin_lock_bh() would now be called
325
      spin_lock_softirq() in a perfect world.
326
    
327
   
328
 
329
   
330
    Locking Between Bottom Halves
331
 
332
    
333
      Sometimes a bottom half might want to share data with
334
      another bottom half (especially remember that timers are run
335
      off a bottom half).
336
    
337
 
338
    
339
     The Same BH
340
 
341
     
342
       Since a bottom half is never run on two CPUs at once, you
343
       don't need to worry about your bottom half being run twice
344
       at once, even on SMP.
345
     
346
    
347
 
348
    
349
     Different BHs
350
 
351
     
352
       Since only one bottom half ever runs at a time once, you
353
       don't need to worry about race conditions with other bottom
354
       halves.  Beware that things might change under you, however,
355
       if someone changes your bottom half to a tasklet.  If you
356
       want to make your code future-proof, pretend you're already
357
       running from a tasklet (see below), and doing the extra
358
       locking.  Of course, if it's five years before that happens,
359
       you're gonna look like a damn fool.
360
     
361
    
362
   
363
 
364
   
365
    Locking Between Tasklets
366
 
367
    
368
      Sometimes a tasklet might want to share data with another
369
      tasklet, or a bottom half.
370
    
371
 
372
    
373
     The Same Tasklet
374
     
375
       Since a tasklet is never run on two CPUs at once, you don't
376
       need to worry about your tasklet being reentrant (running
377
       twice at once), even on SMP.
378
     
379
    
380
 
381
    
382
     Different Tasklets
383
     
384
       If another tasklet (or bottom half, such as a timer) wants
385
       to share data with your tasklet, you will both need to use
386
       spin_lock() and
387
       spin_unlock() calls.
388
       spin_lock_bh() is
389
       unnecessary here, as you are already in a tasklet, and
390
       none will be run on the same CPU.
391
     
392
    
393
   
394
 
395
   
396
    Locking Between Softirqs
397
 
398
    
399
      Often a softirq might
400
      want to share data with itself, a tasklet, or a bottom half.
401
    
402
 
403
    
404
     The Same Softirq
405
 
406
     
407
       The same softirq can run on the other CPUs: you can use a
408
       per-CPU array (see ) for better
409
       performance.  If you're going so far as to use a softirq,
410
       you probably care about scalable performance enough
411
       to justify the extra complexity.
412
     
413
 
414
     
415
       You'll need to use spin_lock() and
416
       spin_unlock() for shared data.
417
     
418
    
419
 
420
    
421
     Different Softirqs
422
 
423
     
424
       You'll need to use spin_lock() and
425
       spin_unlock() for shared data, whether it
426
       be a timer (which can be running on a different CPU), bottom half,
427
       tasklet or the same or another softirq.
428
     
429
    
430
   
431
  
432
 
433
  
434
   Hard IRQ Context
435
 
436
   
437
     Hardware interrupts usually communicate with a bottom half,
438
     tasklet or softirq.  Frequently this involves putting work in a
439
     queue, which the BH/softirq will take out.
440
   
441
 
442
   
443
    Locking Between Hard IRQ and Softirqs/Tasklets/BHs
444
 
445
    
446
      If a hardware irq handler shares data with a softirq, you have
447
      two concerns.  Firstly, the softirq processing can be
448
      interrupted by a hardware interrupt, and secondly, the
449
      critical region could be entered by a hardware interrupt on
450
      another CPU.  This is where spin_lock_irq() is
451
      used.  It is defined to disable interrupts on that cpu, then grab
452
      the lock. spin_unlock_irq() does the reverse.
453
    
454
 
455
    
456
      This works perfectly for UP as well: the spin lock vanishes,
457
      and this macro simply becomes local_irq_disable()
458
      (include/asm/smp.h), which
459
      protects you from the softirq/tasklet/BH being run.
460
    
461
 
462
    
463
      spin_lock_irqsave()
464
      (include/linux/spinlock.h) is a variant
465
      which saves whether interrupts were on or off in a flags word,
466
      which is passed to spin_lock_irqrestore().  This
467
      means that the same code can be used inside an hard irq handler (where
468
      interrupts are already off) and in softirqs (where the irq
469
      disabling is required).
470
    
471
   
472
  
473
 
474
  
475
   Common Techniques
476
 
477
   
478
     This section lists some common dilemmas and the standard
479
     solutions used in the Linux kernel code.  If you use these,
480
     people will find your code simpler to understand.
481
   
482
 
483
   
484
     If I could give you one piece of advice: never sleep with anyone
485
     crazier than yourself.  But if I had to give you advice on
486
     locking: keep it simple.
487
   
488
 
489
   
490
     Lock data, not code.
491
   
492
 
493
   
494
     Be reluctant to introduce new locks.
495
   
496
 
497
   
498
     Strangely enough, this is the exact reverse of my advice when
499
     you have slept with someone crazier than yourself.
500
   
501
 
502
   
503
    No Writers in Interrupt Context
504
 
505
    
506
      There is a fairly common case where an interrupt handler needs
507
      access to a critical region, but does not need write access.
508
      In this case, you do not need to use
509
      read_lock_irq(), but only
510
      read_lock() everywhere (since if an interrupt
511
      occurs, the irq handler will only try to grab a read lock, which
512
      won't deadlock).  You will still need to use
513
      write_lock_irq().
514
    
515
 
516
    
517
      Similar logic applies to locking between softirqs/tasklets/BHs
518
      which never need a write lock, and user context:
519
      read_lock() and
520
      write_lock_bh().
521
    
522
   
523
 
524
   
525
    Deadlock: Simple and Advanced
526
 
527
    
528
      There is a coding bug where a piece of code tries to grab a
529
      spinlock twice: it will spin forever, waiting for the lock to
530
      be released (spinlocks, rwlocks and semaphores are not
531
      recursive in Linux).  This is trivial to diagnose: not a
532
      stay-up-five-nights-talk-to-fluffy-code-bunnies kind of
533
      problem.
534
    
535
 
536
    
537
      For a slightly more complex case, imagine you have a region
538
      shared by a bottom half and user context.  If you use a
539
      spin_lock() call to protect it, it is
540
      possible that the user context will be interrupted by the bottom
541
      half while it holds the lock, and the bottom half will then spin
542
      forever trying to get the same lock.
543
    
544
 
545
    
546
      Both of these are called deadlock, and as shown above, it can
547
      occur even with a single CPU (although not on UP compiles,
548
      since spinlocks vanish on kernel compiles with
549
      CONFIG_SMP=n. You'll still get data corruption
550
      in the second example).
551
    
552
 
553
    
554
      This complete lockup is easy to diagnose: on SMP boxes the
555
      watchdog timer or compiling with DEBUG_SPINLOCKS set
556
      (include/linux/spinlock.h) will show this up
557
      immediately when it happens.
558
    
559
 
560
    
561
      A more complex problem is the so-called `deadly embrace',
562
      involving two or more locks.  Say you have a hash table: each
563
      entry in the table is a spinlock, and a chain of hashed
564
      objects.  Inside a softirq handler, you sometimes want to
565
      alter an object from one place in the hash to another: you
566
      grab the spinlock of the old hash chain and the spinlock of
567
      the new hash chain, and delete the object from the old one,
568
      and insert it in the new one.
569
    
570
 
571
    
572
      There are two problems here.  First, if your code ever
573
      tries to move the object to the same chain, it will deadlock
574
      with itself as it tries to lock it twice.  Secondly, if the
575
      same softirq on another CPU is trying to move another object
576
      in the reverse direction, the following could happen:
577
    
578
 
579
    
580
     Consequences
581
 
582
     
583
 
584
      
585
       
586
        CPU 1
587
        CPU 2
588
       
589
      
590
 
591
      
592
       
593
        Grab lock A -> OK
594
        Grab lock B -> OK
595
       
596
       
597
        Grab lock B -> spin
598
        Grab lock A -> spin
599
       
600
      
601
     
602
    
603
 
604
    
605
      The two CPUs will spin forever, waiting for the other to give up
606
      their lock.  It will look, smell, and feel like a crash.
607
    
608
 
609
    
610
     Preventing Deadlock
611
 
612
     
613
       Textbooks will tell you that if you always lock in the same
614
       order, you will never get this kind of deadlock.  Practice
615
       will tell you that this approach doesn't scale: when I
616
       create a new lock, I don't understand enough of the kernel
617
       to figure out where in the 5000 lock hierarchy it will fit.
618
     
619
 
620
     
621
       The best locks are encapsulated: they never get exposed in
622
       headers, and are never held around calls to non-trivial
623
       functions outside the same file.  You can read through this
624
       code and see that it will never deadlock, because it never
625
       tries to grab another lock while it has that one.  People
626
       using your code don't even need to know you are using a
627
       lock.
628
     
629
 
630
     
631
       A classic problem here is when you provide callbacks or
632
       hooks: if you call these with the lock held, you risk simple
633
       deadlock, or a deadly embrace (who knows what the callback
634
       will do?).  Remember, the other programmers are out to get
635
       you, so don't do this.
636
     
637
    
638
 
639
    
640
     Overzealous Prevention Of Deadlocks
641
 
642
     
643
       Deadlocks are problematic, but not as bad as data
644
       corruption.  Code which grabs a read lock, searches a list,
645
       fails to find what it wants, drops the read lock, grabs a
646
       write lock and inserts the object has a race condition.
647
     
648
 
649
     
650
       If you don't see why, please stay the fuck away from my code.
651
     
652
    
653
   
654
 
655
   
656
    Per-CPU Data
657
 
658
    
659
      A great technique for avoiding locking which is used fairly
660
      widely is to duplicate information for each CPU.  For example,
661
      if you wanted to keep a count of a common condition, you could
662
      use a spin lock and a single counter.  Nice and simple.
663
    
664
 
665
    
666
      If that was too slow [it's probably not], you could instead
667
      use a counter for each CPU [don't], then none of them need an
668
      exclusive lock [you're wasting your time here].  To make sure
669
      the CPUs don't have to synchronize caches all the time, align
670
      the counters to cache boundaries by appending
671
      `__cacheline_aligned' to the declaration
672
      (include/linux/cache.h).
673
      [Can't you think of anything better to do?]
674
    
675
 
676
    
677
      They will need a read lock to access their own counters,
678
      however.  That way you can use a write lock to grant exclusive
679
      access to all of them at once, to tally them up.
680
    
681
   
682
 
683
   
684
    Big Reader Locks
685
 
686
    
687
      A classic example of per-CPU information is Ingo's `big
688
      reader' locks
689
      (linux/include/brlock.h).  These
690
      use the Per-CPU Data techniques described above to create a lock which
691
      is very fast to get a read lock, but agonizingly slow for a write
692
      lock.
693
    
694
 
695
    
696
      Fortunately, there are a limited number of these locks
697
      available: you have to go through a strict interview process
698
      to get one.
699
    
700
   
701
 
702
   
703
    Avoiding Locks: Read And Write Ordering
704
 
705
    
706
      Sometimes it is possible to avoid locking.  Consider the
707
      following case from the 2.2 firewall code, which inserted an
708
      element into a single linked list in user context:
709
    
710
 
711
    
712
        new->next = i->next;
713
        i->next = new;
714
    
715
 
716
    
717
      Here the author (Alan Cox, who knows what he's doing) assumes
718
      that the pointer assignments are atomic.  This is important,
719
      because networking packets would traverse this list on bottom
720
      halves without a lock.  Depending on their exact timing, they
721
      would either see the new element in the list with a valid
722
      next pointer, or it would not be in the
723
      list yet.  A lock is still required against other CPUs inserting
724
      or deleting from the list, of course.
725
    
726
 
727
    
728
      Of course, the writes must be in this
729
      order, otherwise the new element appears in the list with an
730
      invalid next pointer, and any other
731
      CPU iterating at the wrong time will jump through it into garbage.
732
      Because modern CPUs reorder, Alan's code actually read as follows:
733
    
734
 
735
    
736
        new->next = i->next;
737
        wmb();
738
        i->next = new;
739
    
740
 
741
    
742
      The wmb() is a write memory barrier
743
      (include/asm/system.h): neither
744
      the compiler nor the CPU will allow any writes to memory after the
745
      wmb() to be visible to other hardware
746
      before any of the writes before the wmb().
747
    
748
 
749
    
750
      As i386 does not do write reordering, this bug would never
751
      show up on that platform.  On other SMP platforms, however, it
752
      will.
753
    
754
 
755
    
756
      There is also rmb() for read ordering: to ensure
757
      any previous variable reads occur before following reads.  The simple
758
      mb() macro combines both
759
      rmb() and wmb().
760
    
761
 
762
    
763
      Some atomic operations are defined to act as a memory barrier
764
      (ie. as per the mb() macro, but if in
765
      doubt, be explicit.
766
      
767
      Also,
768
      spinlock operations act as partial barriers: operations after
769
      gaining a spinlock will never be moved to precede the
770
      spin_lock() call, and operations before
771
      releasing a spinlock will never be moved after the
772
      spin_unlock() call.
773
      
775
    
776
   
777
 
778
   
779
    Avoiding Locks: Atomic Operations
780
 
781
    
782
      There are a number of atomic operations defined in
783
      include/asm/atomic.h: these
784
      are guaranteed to be seen atomically from all CPUs in the system, thus
785
      avoiding races. If your shared data consists of a single counter, say,
786
      these operations might be simpler than using spinlocks (although for
787
      anything non-trivial using spinlocks is clearer).
788
    
789
 
790
    
791
      Note that the atomic operations do in general not act as memory
792
      barriers. Instead you can insert a memory barrier before or
793
      after atomic_inc() or
794
      atomic_dec() by inserting
795
      smp_mb__before_atomic_inc(),
796
      smp_mb__after_atomic_inc(),
797
      smp_mb__before_atomic_dec() or
798
      smp_mb__after_atomic_dec()
799
      respectively. The advantage of using those macros instead of
800
      smp_mb() is, that they are cheaper on some
801
      platforms.
802
      
803
    
804
   
805
 
806
   
807
    Protecting A Collection of Objects: Reference Counts
808
 
809
    
810
      Locking a collection of objects is fairly easy: you get a
811
      single spinlock, and you make sure you grab it before
812
      searching, adding or deleting an object.
813
    
814
 
815
    
816
      The purpose of this lock is not to protect the individual
817
      objects: you might have a separate lock inside each one for
818
      that.  It is to protect the data structure
819
      containing the objects from race conditions.  Often
820
      the same lock is used to protect the contents of all the
821
      objects as well, for simplicity, but they are inherently
822
      orthogonal (and many other big words designed to confuse).
823
    
824
 
825
    
826
      Changing this to a read-write lock will often help markedly if
827
      reads are far more common that writes.  If not, there is
828
      another approach you can use to reduce the time the lock is
829
      held: reference counts.
830
    
831
 
832
    
833
      In this approach, an object has an owner, who sets the
834
      reference count to one.  Whenever you get a pointer to the
835
      object, you increment the reference count (a `get' operation).
836
      Whenever you relinquish a pointer, you decrement the reference
837
      count (a `put' operation).  When the owner wants to destroy
838
      it, they mark it dead, and do a put.
839
    
840
 
841
    
842
      Whoever drops the reference count to zero (usually implemented
843
      with atomic_dec_and_test()) actually cleans
844
      up and frees the object.
845
    
846
 
847
    
848
      This means that you are guaranteed that the object won't
849
      vanish underneath you, even though you no longer have a lock
850
      for the collection.
851
    
852
 
853
    
854
      Here's some skeleton code:
855
    
856
 
857
    
858
        void create_foo(struct foo *x)
859
        {
860
                atomic_set(&x->use, 1);
861
                spin_lock_bh(&list_lock);
862
                ... insert in list ...
863
                spin_unlock_bh(&list_lock);
864
        }
865
 
866
        struct foo *get_foo(int desc)
867
        {
868
                struct foo *ret;
869
 
870
                spin_lock_bh(&list_lock);
871
                ... find in list ...
872
                if (ret) atomic_inc(&ret->use);
873
                spin_unlock_bh(&list_lock);
874
 
875
                return ret;
876
        }
877
 
878
        void put_foo(struct foo *x)
879
        {
880
                if (atomic_dec_and_test(&x->use))
881
                        kfree(foo);
882
        }
883
 
884
        void destroy_foo(struct foo *x)
885
        {
886
                spin_lock_bh(&list_lock);
887
                ... remove from list ...
888
                spin_unlock_bh(&list_lock);
889
 
890
                put_foo(x);
891
        }
892
    
893
 
894
    
895
     Macros To Help You
896
     
897
       There are a set of debugging macros tucked inside
898
       include/linux/netfilter_ipv4/lockhelp.h
899
       and listhelp.h: these are very
900
       useful for ensuring that locks are held in the right places to protect
901
       infrastructure.
902
     
903
    
904
   
905
 
906
   
907
    Things Which Sleep
908
 
909
    
910
      You can never call the following routines while holding a
911
      spinlock, as they may sleep.  This also means you need to be in
912
      user context.
913
    
914
 
915
    
916
     
917
      
918
        Accesses to
919
        userspace:
920
      
921
      
922
       
923
        
924
          copy_from_user()
925
        
926
       
927
       
928
        
929
          copy_to_user()
930
        
931
       
932
       
933
        
934
          get_user()
935
        
936
       
937
       
938
        
939
           put_user()
940
        
941
       
942
      
943
     
944
 
945
     
946
      
947
        kmalloc(GFP_KERNEL)
948
      
949
     
950
 
951
     
952
      
953
      down_interruptible() and
954
      down()
955
      
956
      
957
       There is a down_trylock() which can be
958
       used inside interrupt context, as it will not sleep.
959
       up() will also never sleep.
960
      
961
     
962
    
963
 
964
    
965
     printk() can be called in
966
     any context, interestingly enough.
967
    
968
   
969
 
970
   
971
    The Fucked Up Sparc
972
 
973
    
974
      Alan Cox says the irq disable/enable is in the register
975
      window on a sparc.  Andi Kleen says when you do
976
      restore_flags in a different function you mess up all the
977
      register windows.
978
    
979
 
980
    
981
      So never pass the flags word set by
982
      spin_lock_irqsave() and brethren to another
983
      function (unless it's declared inline.  Usually no-one
984
      does this, but now you've been warned.  Dave Miller can never do
985
      anything in a straightforward manner (I can say that, because I have
986
      pictures of him and a certain PowerPC maintainer in a compromising
987
      position).
988
    
989
   
990
 
991
   
992
    Racing Timers: A Kernel Pastime
993
 
994
    
995
      Timers can produce their own special problems with races.
996
      Consider a collection of objects (list, hash, etc) where each
997
      object has a timer which is due to destroy it.
998
    
999
 
1000
    
1001
      If you want to destroy the entire collection (say on module
1002
      removal), you might do the following:
1003
    
1004
 
1005
    
1006
        /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE
1007
           HUNGARIAN NOTATION */
1008
        spin_lock_bh(&list_lock);
1009
 
1010
        while (list) {
1011
                struct foo *next = list->next;
1012
                del_timer(&list->timer);
1013
                kfree(list);
1014
                list = next;
1015
        }
1016
 
1017
        spin_unlock_bh(&list_lock);
1018
    
1019
 
1020
    
1021
      Sooner or later, this will crash on SMP, because a timer can
1022
      have just gone off before the spin_lock_bh(),
1023
      and it will only get the lock after we
1024
      spin_unlock_bh(), and then try to free
1025
      the element (which has already been freed!).
1026
    
1027
 
1028
    
1029
      This can be avoided by checking the result of
1030
      del_timer(): if it returns
1031
      1, the timer has been deleted.
1032
      If 0, it means (in this
1033
      case) that it is currently running, so we can do:
1034
    
1035
 
1036
    
1037
        retry:
1038
                spin_lock_bh(&list_lock);
1039
 
1040
                while (list) {
1041
                        struct foo *next = list->next;
1042
                        if (!del_timer(&list->timer)) {
1043
                                /* Give timer a chance to delete this */
1044
                                spin_unlock_bh(&list_lock);
1045
                                goto retry;
1046
                        }
1047
                        kfree(list);
1048
                        list = next;
1049
                }
1050
 
1051
                spin_unlock_bh(&list_lock);
1052
    
1053
 
1054
    
1055
      Another common problem is deleting timers which restart
1056
      themselves (by calling add_timer() at the end
1057
      of their timer function).  Because this is a fairly common case
1058
      which is prone to races, you should use del_timer_sync()
1059
      (include/linux/timer.h)
1060
      to handle this case.  It returns the number of times the timer
1061
      had to be deleted before we finally stopped it from adding itself back
1062
      in.
1063
    
1064
   
1065
  
1066
 
1067
  
1068
   Further reading
1069
 
1070
   
1071
    
1072
     
1073
       Documentation/spinlocks.txt:
1074
       Linus Torvalds' spinlocking tutorial in the kernel sources.
1075
     
1076
    
1077
 
1078
    
1079
     
1080
       Unix Systems for Modern Architectures: Symmetric
1081
       Multiprocessing and Caching for Kernel Programmers:
1082
     
1083
 
1084
     
1085
       Curt Schimmel's very good introduction to kernel level
1086
       locking (not written for Linux, but nearly everything
1087
       applies).  The book is expensive, but really worth every
1088
       penny to understand SMP locking. [ISBN: 0201633388]
1089
     
1090
    
1091
   
1092
  
1093
 
1094
  
1095
    Thanks
1096
 
1097
    
1098
      Thanks to Telsa Gwynne for DocBooking, neatening and adding
1099
      style.
1100
    
1101
 
1102
    
1103
      Thanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul
1104
      Mackerras, Ruedi Aschwanden, Alan Cox, Manfred Spraul and Tim
1105
      Waugh for proofreading, correcting, flaming, commenting.
1106
    
1107
 
1108
    
1109
      Thanks to the cabal for having no influence on this document.
1110
    
1111
  
1112
 
1113
  
1114
   Glossary
1115
 
1116
   
1117
    bh
1118
     
1119
      
1120
        Bottom Half: for historical reasons, functions with
1121
        `_bh' in them often now refer to any software interrupt, e.g.
1122
        spin_lock_bh() blocks any software interrupt
1123
        on the current CPU.  Bottom halves are deprecated, and will
1124
        eventually be replaced by tasklets.  Only one bottom half will be
1125
        running at any time.
1126
     
1127
    
1128
   
1129
 
1130
   
1131
    Hardware Interrupt / Hardware IRQ
1132
    
1133
     
1134
       Hardware interrupt request.  in_irq() returns
1135
       true in a hardware interrupt handler (it
1136
       also returns true when interrupts are blocked).
1137
     
1138
    
1139
   
1140
 
1141
   
1142
    Interrupt Context
1143
    
1144
     
1145
       Not user context: processing a hardware irq or software irq.
1146
       Indicated by the in_interrupt() macro
1147
       returning true (although it also
1148
       returns true when interrupts or BHs are blocked).
1149
     
1150
    
1151
   
1152
 
1153
   
1154
    SMP
1155
    
1156
     
1157
       Symmetric Multi-Processor: kernels compiled for multiple-CPU
1158
       machines.  (CONFIG_SMP=y).
1159
     
1160
    
1161
   
1162
 
1163
   
1164
    softirq
1165
    
1166
     
1167
       Strictly speaking, one of up to 32 enumerated software
1168
       interrupts which can run on multiple CPUs at once.
1169
       Sometimes used to refer to tasklets and bottom halves as
1170
       well (ie. all software interrupts).
1171
     
1172
    
1173
   
1174
 
1175
   
1176
    Software Interrupt / Software IRQ
1177
    
1178
     
1179
       Software interrupt handler.  in_irq() returns
1180
       false; in_softirq()
1181
       returns true.  Tasklets, softirqs and
1182
       bottom halves all fall into the category of `software interrupts'.
1183
     
1184
    
1185
   
1186
 
1187
   
1188
    tasklet
1189
    
1190
     
1191
       A dynamically-registrable software interrupt,
1192
       which is guaranteed to only run on one CPU at a time.
1193
     
1194
    
1195
   
1196
 
1197
   
1198
    UP
1199
    
1200
     
1201
       Uni-Processor: Non-SMP.  (CONFIG_SMP=n).
1202
     
1203
    
1204
   
1205
 
1206
   
1207
    User Context
1208
    
1209
     
1210
       The kernel executing on behalf of a particular
1211
       process or kernel thread (given by the current()
1212
       macro.)  Not to be confused with userspace.  Can be interrupted by
1213
       software  or hardware interrupts.
1214
     
1215
    
1216
   
1217
 
1218
   
1219
    Userspace
1220
    
1221
     
1222
       A process executing its own code outside the kernel.
1223
     
1224
    
1225
   
1226
 
1227
  
1228
1229
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.