1 |
199 |
simons |
Mandatory File Locking For The Linux Operating System
|
2 |
|
|
|
3 |
|
|
Andy Walker
|
4 |
|
|
|
5 |
|
|
15 April 1996
|
6 |
|
|
|
7 |
|
|
|
8 |
|
|
1. What is mandatory locking?
|
9 |
|
|
------------------------------
|
10 |
|
|
|
11 |
|
|
Mandatory locking is kernel enforced file locking, as opposed to the more usual
|
12 |
|
|
cooperative file locking used to guarantee sequential access to files among
|
13 |
|
|
processes. File locks are applied using the flock() and fcntl() system calls
|
14 |
|
|
(and the lockf() library routine which is a wrapper around fcntl().) It is
|
15 |
|
|
normally a process' responsibility to check for locks on a file it wishes to
|
16 |
|
|
update, before applying its own lock, updating the file and unlocking it again.
|
17 |
|
|
The most commonly used example of this (and in the case of sendmail, the most
|
18 |
|
|
troublesome) is access to a user's mailbox. The mail user agent and the mail
|
19 |
|
|
transfer agent must guard against updating the mailbox at the same time, and
|
20 |
|
|
prevent reading the mailbox while it is being updated.
|
21 |
|
|
|
22 |
|
|
In a perfect world all process would use and honour a cooperative, or
|
23 |
|
|
"advisory" locking scheme. However, the world isn't perfect, and there's
|
24 |
|
|
a lot of poorly written code out there.
|
25 |
|
|
|
26 |
|
|
In trying to address this problem, the designers of System V UNIX came up
|
27 |
|
|
with a "mandatory" locking scheme, whereby the operating system kernel would
|
28 |
|
|
block attempts by a process to write to a file that another process holds a
|
29 |
|
|
"read" -or- "shared" lock on, and block attempts to both read and write to a
|
30 |
|
|
file that a process holds a "write " -or- "exclusive" lock on.
|
31 |
|
|
|
32 |
|
|
The System V mandatory locking scheme was intended to have as little impact as
|
33 |
|
|
possible on existing user code. The scheme is based on marking individual files
|
34 |
|
|
as candidates for mandatory locking, and using the existing fcntl()/lockf()
|
35 |
|
|
interface for applying locks just as if they were normal, advisory locks.
|
36 |
|
|
|
37 |
|
|
Note 1: In saying "file" in the paragraphs above I am actually not telling
|
38 |
|
|
the whole truth. System V locking is based on fcntl(). The granularity of
|
39 |
|
|
fcntl() is such that it allows the locking of byte ranges in files, in addition
|
40 |
|
|
to entire files, so the mandatory locking rules also have byte level
|
41 |
|
|
granularity.
|
42 |
|
|
|
43 |
|
|
Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
|
44 |
|
|
borrowing the fcntl() locking scheme from System V. The mandatory locking
|
45 |
|
|
scheme is defined by the System V Interface Definition (SVID) Version 3.
|
46 |
|
|
|
47 |
|
|
2. Marking a file for mandatory locking
|
48 |
|
|
---------------------------------------
|
49 |
|
|
|
50 |
|
|
A file is marked as a candidate for mandatory by setting the group-id bit in
|
51 |
|
|
its file mode but removing the group-execute bit. This is an otherwise
|
52 |
|
|
meaningless combination, and was chosen by the System V implementors so as not
|
53 |
|
|
to break existing user programs.
|
54 |
|
|
|
55 |
|
|
Note that the group-id bit is usually automatically cleared by the kernel when
|
56 |
|
|
a setgid file is written to. This is a security measure. The kernel has been
|
57 |
|
|
modified to recognize the special case of a mandatory lock candidate and to
|
58 |
|
|
refrain from clearing this bit. Similarly the kernel has been modified not
|
59 |
|
|
to run mandatory lock candidates with setgid privileges.
|
60 |
|
|
|
61 |
|
|
3. Available implementations
|
62 |
|
|
----------------------------
|
63 |
|
|
|
64 |
|
|
I have considered the implementations of mandatory locking available with
|
65 |
|
|
SunOS 4.1.x, Solaris 2.x and HP-UX 9.x.
|
66 |
|
|
|
67 |
|
|
Generally I have tried to make the most sense out of the behaviour exhibited
|
68 |
|
|
by these three reference systems. There are many anomalies.
|
69 |
|
|
|
70 |
|
|
All the reference systems reject all calls to open() for a file on which
|
71 |
|
|
another process has outstanding mandatory locks. This is in direct
|
72 |
|
|
contravention of SVID 3, which states that only calls to open() with the
|
73 |
|
|
O_TRUNC flag set should be rejected. The Linux implementation follows the SVID
|
74 |
|
|
definition, which is the "Right Thing", since only calls with O_TRUNC can
|
75 |
|
|
modify the contents of the file.
|
76 |
|
|
|
77 |
|
|
HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not
|
78 |
|
|
just mandatory locks. That would appear to contravene POSIX.1.
|
79 |
|
|
|
80 |
|
|
mmap() is another interesting case. All the operating systems mentioned
|
81 |
|
|
prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX
|
82 |
|
|
also disallows advisory locks for such a file. SVID actually specifies the
|
83 |
|
|
paranoid HP-UX behaviour.
|
84 |
|
|
|
85 |
|
|
In my opinion only MAP_SHARED mappings should be immune from locking, and then
|
86 |
|
|
only from mandatory locks - that is what is currently implemented.
|
87 |
|
|
|
88 |
|
|
SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for
|
89 |
|
|
mandatory locks, so reads and writes to locked files always block when they
|
90 |
|
|
should return EAGAIN.
|
91 |
|
|
|
92 |
|
|
I'm afraid that this is such an esoteric area that the semantics described
|
93 |
|
|
below are just as valid as any others, so long as the main points seem to
|
94 |
|
|
agree.
|
95 |
|
|
|
96 |
|
|
4. Semantics
|
97 |
|
|
------------
|
98 |
|
|
|
99 |
|
|
1. Mandatory locks can only be applied via the fcntl()/lockf() locking
|
100 |
|
|
interface - in other words the System V/POSIX interface. BSD style
|
101 |
|
|
locks using flock() never result in a mandatory lock.
|
102 |
|
|
|
103 |
|
|
2. If a process has locked a region of a file with a mandatory read lock, then
|
104 |
|
|
other processes are permitted to read from that region. If any of these
|
105 |
|
|
processes attempts to write to the region it will block until the lock is
|
106 |
|
|
released, unless the process has opened the file opened with the O_NONBLOCK
|
107 |
|
|
flag in which case the system call will return immediately with the error
|
108 |
|
|
status EAGAIN.
|
109 |
|
|
|
110 |
|
|
3. If a process has locked a region of a file with a mandatory write lock, all
|
111 |
|
|
attempts to read or write to that region block until the lock is released,
|
112 |
|
|
unless a process has opened the file with the O_NONBLOCK flag in which case
|
113 |
|
|
the system call will return immediately with the error status EAGAIN.
|
114 |
|
|
|
115 |
|
|
4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has
|
116 |
|
|
any mandatory locks owned by other processes will be rejected with the
|
117 |
|
|
error status EAGAIN.
|
118 |
|
|
|
119 |
|
|
5. Attempts to apply a mandatory lock to a file that is memory mapped and
|
120 |
|
|
shared (via mmap() with MAP_SHARED) will be rejected with the error status
|
121 |
|
|
EAGAIN.
|
122 |
|
|
|
123 |
|
|
6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED)
|
124 |
|
|
that has any mandatory locks in effect will be rejected with the error status
|
125 |
|
|
EAGAIN.
|
126 |
|
|
|
127 |
|
|
5. Which system calls are affected?
|
128 |
|
|
-----------------------------------
|
129 |
|
|
|
130 |
|
|
Those which modify a file's contents, not just the inode. That gives read(),
|
131 |
|
|
write(), readv(), writev(), open(), creat(), mmap(), truncate() and
|
132 |
|
|
ftruncate(). truncate() and ftruncate() are considered to be "write" actions
|
133 |
|
|
for the purposes of mandatory locking.
|
134 |
|
|
|
135 |
|
|
The affected region is usually defined as stretching from the current position
|
136 |
|
|
for the total number of bytes read or written. For the truncate calls it is
|
137 |
|
|
defined as the bytes of a file removed or added (we must also consider bytes
|
138 |
|
|
added, as a lock can specify just "the whole file", rather than a specific
|
139 |
|
|
range of bytes.)
|
140 |
|
|
|
141 |
|
|
Note 3: I may have overlooked some system calls that need mandatory lock
|
142 |
|
|
checking in my eagerness to get this code out the door. Please let me know, or
|
143 |
|
|
better still fix the system calls yourself and submit a patch to me or Linus.
|
144 |
|
|
|
145 |
|
|
6. Warning!
|
146 |
|
|
-----------
|
147 |
|
|
|
148 |
|
|
Not even root can override a mandatory lock, so runaway process can wreak
|
149 |
|
|
havoc if they lock crucial files. The way around it is to change the file
|
150 |
|
|
permissions (remove the setgid bit) before trying to read or write to it.
|
151 |
|
|
Of course, that might be a bit tricky if the system is hung :-(
|
152 |
|
|
|