OpenCores
URL https://opencores.org/ocsvn/or1k/or1k/trunk

Subversion Repositories or1k

[/] [or1k/] [trunk/] [linux/] [linux-2.4/] [Documentation/] [DocBook/] [journal-api.tmpl] - Blame information for rev 1765

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 1275 phoenix
2
3
 
4
  The Linux Journalling API
5
  
6
  
7
     Roger
8
     Gammans
9
     
10
     
11
      rgammans@computer-surgery.co.uk
12
     
13
    
14
     
15
  
16
 
17
  
18
   
19
    Stephen
20
    Tweedie
21
    
22
     
23
      sct@redhat.com
24
     
25
    
26
   
27
  
28
 
29
  
30
   2002
31
   Roger Gammans
32
  
33
 
34
35
   
36
     This documentation is free software; you can redistribute
37
     it and/or modify it under the terms of the GNU General Public
38
     License as published by the Free Software Foundation; either
39
     version 2 of the License, or (at your option) any later
40
     version.
41
   
42
 
43
   
44
     This program is distributed in the hope that it will be
45
     useful, but WITHOUT ANY WARRANTY; without even the implied
46
     warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
47
     See the GNU General Public License for more details.
48
   
49
 
50
   
51
     You should have received a copy of the GNU General Public
52
     License along with this program; if not, write to the Free
53
     Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
54
     MA 02111-1307 USA
55
   
56
 
57
   
58
     For more details see the file COPYING in the source
59
     distribution of Linux.
60
   
61
  
62
 
63
 
64
65
 
66
  
67
     Overview
68
  
69
     Details
70
71
The journalling layer is  easy to use. You need to
72
first of all create a journal_t data structure. There are
73
two calls to do this dependent on how you decide to allocate the physical
74
media on which the journal resides. The journal_init_inode() call
75
is for journals stored in filesystem inodes, or the journal_init_dev()
76
call can be use for journal stored on a raw device (in a continuous range
77
of blocks). A journal_t is a typedef for a struct pointer, so when
78
you are finally finished make sure you call journal_destroy() on it
79
to free up any used kernel memory.
80
81
 
82
83
Once you have got your journal_t object you need to 'mount' or load the journal
84
file, unless of course you haven't initialised it yet - in which case you
85
need to call journal_create().
86
87
 
88
89
Most of the time however your journal file will already have been created, but
90
before you load it you must call journal_wipe() to empty the journal file.
91
Hang on, you say , what if the filesystem wasn't cleanly umount()'d . Well, it is the
92
job of the client file system to detect this and skip the call to journal_wipe().
93
94
 
95
96
In either case the next call should be to journal_load() which prepares the
97
journal file for use. Note that journal_wipe(..,0) calls journal_skip_recovery()
98
for you if it detects any outstanding transactions in the journal and similarly
99
journal_load() will call journal_recover() if necessary.
100
I would advise reading fs/ext3/super.c for examples on this stage.
101
[RGG: Why is the journal_wipe() call necessary - doesn't this needlessly
102
complicate the API. Or isn't a good idea for the journal layer to hide
103
dirty mounts from the client fs]
104
105
 
106
107
Now you can go ahead and start modifying the underlying
108
filesystem. Almost.
109
110
 
111
 
112
113
 
114
You still need to actually journal your filesystem changes, this
115
is done by wrapping them into transactions. Additionally you
116
also need to wrap the modification of each of the the buffers
117
with calls to the journal layer, so it knows what the modifications
118
you are actually making are. To do this use  journal_start() which
119
returns a transaction handle.
120
121
 
122
123
journal_start()
124
and its counterpart journal_stop(), which indicates the end of a transaction
125
are nestable calls, so you can reenter a transaction if necessary,
126
but remember you must call journal_stop() the same number of times as
127
journal_start() before the transaction is completed (or more accurately
128
leaves the the update phase). Ext3/VFS makes use of this feature to simplify
129
quota support.
130
131
 
132
133
Inside each transaction you need to wrap the modifications to the
134
individual buffers (blocks). Before you start to modify a buffer you
135
need to call journal_get_{create,write,undo}_access() as appropriate,
136
this allows the journalling layer to copy the unmodified data if it
137
needs to. After all the buffer may be part of a previously uncommitted
138
transaction.
139
At this point you are at last ready to modify a buffer, and once
140
you are have done so you need to call journal_dirty_{meta,}data().
141
Or if you've asked for access to a buffer you now know is now longer
142
required to be pushed back on the device you can call journal_forget()
143
in much the same way as you might have used bforget() in the past.
144
 
145
146
 
147
 
148
 
149
150
A journal_flush() may be called at any time to commit and checkpoint
151
all your transactions.
152
153
154
 
155
Then at umount time , in your put_super() (2.4) or write_super() (2.5)
156
you can then call journal_destroy() to clean up your in-core journal object.
157
158
 
159
 
160
161
Unfortunately there a couple of ways the journal layer can cause a deadlock.
162
The first thing to note is that each task can only have
163
a single outstanding transaction at any one time, remember nothing
164
commits until the outermost journal_stop(). This means
165
you must complete the transaction at the end of each file/inode/address
166
etc. operation you perform, so that the journalling system isn't re-entered
167
on another journal. Since transactions can't be nested/batched
168
across differing journals, and another filesystem other than
169
yours (say ext3) may be modified in a later syscall.
170
171
172
 
173
The second case to bear in mind is that journal_start() can
174
block if there isn't enough space in the journal for your transaction
175
(based on the passed nblocks param) - when it blocks it merely(!) needs to
176
wait for transactions to complete and be committed from other tasks,
177
so essentially we are waiting for journal_stop(). So to avoid
178
deadlocks you must treat journal_start/stop() as if they
179
were semaphores and include them in your semaphore ordering rules to prevent
180
deadlocks. Note that journal_extend() has similar blocking behaviour to
181
journal_start() so you can deadlock here just as easily as on journal_start().
182
183
184
 
185
Try to reserve the right number of blocks the first time. ;-).
186
187
188
Another wriggle to watch out for is your on-disk block allocation strategy.
189
why? Because, if you undo a delete, you need to ensure you haven't reused any
190
of the freed blocks in a later transaction. One simple way of doing this
191
is make sure any blocks you allocate only have checkpointed transactions
192
listed against them. Ext3 does this in ext3_test_allocatable().
193
194
 
195
196
Lock is also providing through journal_{un,}lock_updates(),
197
ext3 uses this when it wants a window with a clean and stable fs for a moment.
198
eg.
199
200
 
201
202
 
203
        journal_lock_updates() //stop new stuff happening..
204
        journal_flush()        // checkpoint everything.
205
        ..do stuff on stable fs
206
        journal_unlock_updates() // carry on with filesystem use.
207
208
 
209
210
The opportunities for abuse and DOS attacks with this should be obvious,
211
if you allow unprivileged userspace to trigger codepaths containing these
212
calls.
213
 
214
215
216
217
Summary
218
219
Using the journal is a matter of wrapping the different context changes,
220
being each mount, each modification (transaction) and each changed buffer
221
to tell the journalling layer about them.
222
223
 
224
225
Here is a some pseudo code to give you an idea of how it works, as
226
an example.
227
228
 
229
230
  journal_t* my_jnrl = journal_create();
231
  journal_init_{dev,inode}(jnrl,...)
232
  if (clean) journal_wipe();
233
  journal_load();
234
 
235
   foreach(transaction) { /*transactions must be
236
                            completed before
237
                            a syscall returns to
238
                            userspace*/
239
 
240
          handle_t * xct=journal_start(my_jnrl);
241
          foreach(bh) {
242
                journal_get_{create,write,undo}_access(xact,bh);
243
                if ( myfs_modify(bh) ) { /* returns true
244
                                        if makes changes */
245
                           journal_dirty_{meta,}data(xact,bh);
246
                } else {
247
                           journal_forget(bh);
248
                }
249
          }
250
          journal_stop(xct);
251
   }
252
   journal_destroy(my_jrnl);
253
254
255
 
256
257
 
258
  
259
     Data Types
260
     
261
        The journalling layer uses typedefs to 'hide' the concrete definitions
262
        of the structures used. As a client of the JBD layer you can
263
        just rely on the using the pointer as a magic cookie  of some sort.
264
 
265
        Obviously the hiding is not enforced as this is 'C'.
266
        
267
        Structures
268
!Iinclude/linux/jbd.h
269
        
270
271
 
272
  
273
     Functions
274
     
275
        The functions here are split into two groups those that
276
        affect a journal as a whole, and those which are used to
277
        manage transactions
278
279
        Journal Level
280
!Efs/jbd/journal.c
281
!Efs/jbd/recovery.c
282
        
283
        Transasction Level
284
!Efs/jbd/transaction.c
285
        
286
287
288
     See also
289
        
290
        
291
           
292
                Journaling the Linux ext2fs Filesystem,LinuxExpo 98, Stephen Tweedie
293
           
294
           
295
           
296
           
297
           
298
           
299
                Ext3 Journalling FileSystem , OLS 2000, Dr. Stephen Tweedie
300
           
301
           
302
           
303
304
 
305

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.