1 |
1275 |
phoenix |
Network Block Device (TCP version)
|
2 |
|
|
|
3 |
|
|
Note: Network Block Device is now experimental, which approximately
|
4 |
|
|
means, that it works on my computer, and it worked on one of school
|
5 |
|
|
computers.
|
6 |
|
|
|
7 |
|
|
What is it: With this compiled in the kernel, Linux can use a remote
|
8 |
|
|
server as one of its block devices. So every time the client computer
|
9 |
|
|
wants to read /dev/nd0, it sends a request over TCP to the server, which
|
10 |
|
|
will reply with the data read. This can be used for stations with
|
11 |
|
|
low disk space (or even diskless - if you boot from floppy) to
|
12 |
|
|
borrow disk space from another computer. Unlike NFS, it is possible to
|
13 |
|
|
put any filesystem on it etc. It is impossible to use NBD as a root
|
14 |
|
|
filesystem, since it requires a user-level program to start. It also
|
15 |
|
|
allows you to run block-device in user land (making server and client
|
16 |
|
|
physically the same computer, communicating using loopback).
|
17 |
|
|
|
18 |
|
|
Current state: It currently works. Network block device looks like
|
19 |
|
|
being pretty stable. I originally thought that it is impossible to swap
|
20 |
|
|
over TCP. It turned out not to be true - swapping over TCP now works
|
21 |
|
|
and seems to be deadlock-free, but it requires heavy patches into
|
22 |
|
|
Linux's network layer.
|
23 |
|
|
|
24 |
|
|
Devices: Network block device uses major 43, minors 0..n (where n is
|
25 |
|
|
configurable in nbd.h). Create these files by mknod when needed. After
|
26 |
|
|
that, your ls -l /dev/ should look like:
|
27 |
|
|
|
28 |
|
|
brw-rw-rw- 1 root root 43, 0 Apr 11 00:28 nd0
|
29 |
|
|
brw-rw-rw- 1 root root 43, 1 Apr 11 00:28 nd1
|
30 |
|
|
...
|
31 |
|
|
|
32 |
|
|
Protocol: Userland program passes file handle with connected TCP
|
33 |
|
|
socket to actual kernel driver. This way, the kernel does not have to
|
34 |
|
|
care about connecting etc. Protocol is rather simple: If the driver is
|
35 |
|
|
asked to read from block device, it sends packet of following form
|
36 |
|
|
"request" (all data are in network byte order):
|
37 |
|
|
|
38 |
|
|
__u32 magic; must be equal to 0x12560953
|
39 |
|
|
__u32 from; position in bytes to read from / write at
|
40 |
|
|
__u32 len; number of bytes to be read / written
|
41 |
|
|
__u64 handle; handle of operation
|
42 |
|
|
__u32 type; 0 = read
|
43 |
|
|
1 = write
|
44 |
|
|
... in case of write operation, this is
|
45 |
|
|
immediately followed len bytes of data
|
46 |
|
|
|
47 |
|
|
When operation is completed, server responds with packet of following
|
48 |
|
|
structure "reply":
|
49 |
|
|
|
50 |
|
|
__u32 magic; must be equal to
|
51 |
|
|
__u64 handle; handle copied from request
|
52 |
|
|
__u32 error; 0 = operation completed successfully,
|
53 |
|
|
else error code
|
54 |
|
|
... in case of read operation with no error,
|
55 |
|
|
this is immediately followed len bytes of data
|
56 |
|
|
|
57 |
|
|
For more information, look at http://nbd.sf.net/.
|