1 |
747 |
jeremybenn |
// Copyright 2009 The Go Authors. All rights reserved.
|
2 |
|
|
// Use of this source code is governed by a BSD-style
|
3 |
|
|
// license that can be found in the LICENSE file.
|
4 |
|
|
|
5 |
|
|
/*
|
6 |
|
|
Package gob manages streams of gobs - binary values exchanged between an
|
7 |
|
|
Encoder (transmitter) and a Decoder (receiver). A typical use is transporting
|
8 |
|
|
arguments and results of remote procedure calls (RPCs) such as those provided by
|
9 |
|
|
package "rpc".
|
10 |
|
|
|
11 |
|
|
A stream of gobs is self-describing. Each data item in the stream is preceded by
|
12 |
|
|
a specification of its type, expressed in terms of a small set of predefined
|
13 |
|
|
types. Pointers are not transmitted, but the things they point to are
|
14 |
|
|
transmitted; that is, the values are flattened. Recursive types work fine, but
|
15 |
|
|
recursive values (data with cycles) are problematic. This may change.
|
16 |
|
|
|
17 |
|
|
To use gobs, create an Encoder and present it with a series of data items as
|
18 |
|
|
values or addresses that can be dereferenced to values. The Encoder makes sure
|
19 |
|
|
all type information is sent before it is needed. At the receive side, a
|
20 |
|
|
Decoder retrieves values from the encoded stream and unpacks them into local
|
21 |
|
|
variables.
|
22 |
|
|
|
23 |
|
|
The source and destination values/types need not correspond exactly. For structs,
|
24 |
|
|
fields (identified by name) that are in the source but absent from the receiving
|
25 |
|
|
variable will be ignored. Fields that are in the receiving variable but missing
|
26 |
|
|
from the transmitted type or value will be ignored in the destination. If a field
|
27 |
|
|
with the same name is present in both, their types must be compatible. Both the
|
28 |
|
|
receiver and transmitter will do all necessary indirection and dereferencing to
|
29 |
|
|
convert between gobs and actual Go values. For instance, a gob type that is
|
30 |
|
|
schematically,
|
31 |
|
|
|
32 |
|
|
struct { A, B int }
|
33 |
|
|
|
34 |
|
|
can be sent from or received into any of these Go types:
|
35 |
|
|
|
36 |
|
|
struct { A, B int } // the same
|
37 |
|
|
*struct { A, B int } // extra indirection of the struct
|
38 |
|
|
struct { *A, **B int } // extra indirection of the fields
|
39 |
|
|
struct { A, B int64 } // different concrete value type; see below
|
40 |
|
|
|
41 |
|
|
It may also be received into any of these:
|
42 |
|
|
|
43 |
|
|
struct { A, B int } // the same
|
44 |
|
|
struct { B, A int } // ordering doesn't matter; matching is by name
|
45 |
|
|
struct { A, B, C int } // extra field (C) ignored
|
46 |
|
|
struct { B int } // missing field (A) ignored; data will be dropped
|
47 |
|
|
struct { B, C int } // missing field (A) ignored; extra field (C) ignored.
|
48 |
|
|
|
49 |
|
|
Attempting to receive into these types will draw a decode error:
|
50 |
|
|
|
51 |
|
|
struct { A int; B uint } // change of signedness for B
|
52 |
|
|
struct { A int; B float } // change of type for B
|
53 |
|
|
struct { } // no field names in common
|
54 |
|
|
struct { C, D int } // no field names in common
|
55 |
|
|
|
56 |
|
|
Integers are transmitted two ways: arbitrary precision signed integers or
|
57 |
|
|
arbitrary precision unsigned integers. There is no int8, int16 etc.
|
58 |
|
|
discrimination in the gob format; there are only signed and unsigned integers. As
|
59 |
|
|
described below, the transmitter sends the value in a variable-length encoding;
|
60 |
|
|
the receiver accepts the value and stores it in the destination variable.
|
61 |
|
|
Floating-point numbers are always sent using IEEE-754 64-bit precision (see
|
62 |
|
|
below).
|
63 |
|
|
|
64 |
|
|
Signed integers may be received into any signed integer variable: int, int16, etc.;
|
65 |
|
|
unsigned integers may be received into any unsigned integer variable; and floating
|
66 |
|
|
point values may be received into any floating point variable. However,
|
67 |
|
|
the destination variable must be able to represent the value or the decode
|
68 |
|
|
operation will fail.
|
69 |
|
|
|
70 |
|
|
Structs, arrays and slices are also supported. Strings and arrays of bytes are
|
71 |
|
|
supported with a special, efficient representation (see below). When a slice is
|
72 |
|
|
decoded, if the existing slice has capacity the slice will be extended in place;
|
73 |
|
|
if not, a new array is allocated. Regardless, the length of the resulting slice
|
74 |
|
|
reports the number of elements decoded.
|
75 |
|
|
|
76 |
|
|
Functions and channels cannot be sent in a gob. Attempting
|
77 |
|
|
to encode a value that contains one will fail.
|
78 |
|
|
|
79 |
|
|
The rest of this comment documents the encoding, details that are not important
|
80 |
|
|
for most users. Details are presented bottom-up.
|
81 |
|
|
|
82 |
|
|
An unsigned integer is sent one of two ways. If it is less than 128, it is sent
|
83 |
|
|
as a byte with that value. Otherwise it is sent as a minimal-length big-endian
|
84 |
|
|
(high byte first) byte stream holding the value, preceded by one byte holding the
|
85 |
|
|
byte count, negated. Thus 0 is transmitted as (00), 7 is transmitted as (07) and
|
86 |
|
|
256 is transmitted as (FE 01 00).
|
87 |
|
|
|
88 |
|
|
A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
|
89 |
|
|
|
90 |
|
|
A signed integer, i, is encoded within an unsigned integer, u. Within u, bits 1
|
91 |
|
|
upward contain the value; bit 0 says whether they should be complemented upon
|
92 |
|
|
receipt. The encode algorithm looks like this:
|
93 |
|
|
|
94 |
|
|
uint u;
|
95 |
|
|
if i < 0 {
|
96 |
|
|
u = (^i << 1) | 1 // complement i, bit 0 is 1
|
97 |
|
|
} else {
|
98 |
|
|
u = (i << 1) // do not complement i, bit 0 is 0
|
99 |
|
|
}
|
100 |
|
|
encodeUnsigned(u)
|
101 |
|
|
|
102 |
|
|
The low bit is therefore analogous to a sign bit, but making it the complement bit
|
103 |
|
|
instead guarantees that the largest negative integer is not a special case. For
|
104 |
|
|
example, -129=^128=(^256>>1) encodes as (FE 01 01).
|
105 |
|
|
|
106 |
|
|
Floating-point numbers are always sent as a representation of a float64 value.
|
107 |
|
|
That value is converted to a uint64 using math.Float64bits. The uint64 is then
|
108 |
|
|
byte-reversed and sent as a regular unsigned integer. The byte-reversal means the
|
109 |
|
|
exponent and high-precision part of the mantissa go first. Since the low bits are
|
110 |
|
|
often zero, this can save encoding bytes. For instance, 17.0 is encoded in only
|
111 |
|
|
three bytes (FE 31 40).
|
112 |
|
|
|
113 |
|
|
Strings and slices of bytes are sent as an unsigned count followed by that many
|
114 |
|
|
uninterpreted bytes of the value.
|
115 |
|
|
|
116 |
|
|
All other slices and arrays are sent as an unsigned count followed by that many
|
117 |
|
|
elements using the standard gob encoding for their type, recursively.
|
118 |
|
|
|
119 |
|
|
Maps are sent as an unsigned count followed by that man key, element
|
120 |
|
|
pairs. Empty but non-nil maps are sent, so if the sender has allocated
|
121 |
|
|
a map, the receiver will allocate a map even no elements are
|
122 |
|
|
transmitted.
|
123 |
|
|
|
124 |
|
|
Structs are sent as a sequence of (field number, field value) pairs. The field
|
125 |
|
|
value is sent using the standard gob encoding for its type, recursively. If a
|
126 |
|
|
field has the zero value for its type, it is omitted from the transmission. The
|
127 |
|
|
field number is defined by the type of the encoded struct: the first field of the
|
128 |
|
|
encoded type is field 0, the second is field 1, etc. When encoding a value, the
|
129 |
|
|
field numbers are delta encoded for efficiency and the fields are always sent in
|
130 |
|
|
order of increasing field number; the deltas are therefore unsigned. The
|
131 |
|
|
initialization for the delta encoding sets the field number to -1, so an unsigned
|
132 |
|
|
integer field 0 with value 7 is transmitted as unsigned delta = 1, unsigned value
|
133 |
|
|
= 7 or (01 07). Finally, after all the fields have been sent a terminating mark
|
134 |
|
|
denotes the end of the struct. That mark is a delta=0 value, which has
|
135 |
|
|
representation (00).
|
136 |
|
|
|
137 |
|
|
Interface types are not checked for compatibility; all interface types are
|
138 |
|
|
treated, for transmission, as members of a single "interface" type, analogous to
|
139 |
|
|
int or []byte - in effect they're all treated as interface{}. Interface values
|
140 |
|
|
are transmitted as a string identifying the concrete type being sent (a name
|
141 |
|
|
that must be pre-defined by calling Register), followed by a byte count of the
|
142 |
|
|
length of the following data (so the value can be skipped if it cannot be
|
143 |
|
|
stored), followed by the usual encoding of concrete (dynamic) value stored in
|
144 |
|
|
the interface value. (A nil interface value is identified by the empty string
|
145 |
|
|
and transmits no value.) Upon receipt, the decoder verifies that the unpacked
|
146 |
|
|
concrete item satisfies the interface of the receiving variable.
|
147 |
|
|
|
148 |
|
|
The representation of types is described below. When a type is defined on a given
|
149 |
|
|
connection between an Encoder and Decoder, it is assigned a signed integer type
|
150 |
|
|
id. When Encoder.Encode(v) is called, it makes sure there is an id assigned for
|
151 |
|
|
the type of v and all its elements and then it sends the pair (typeid, encoded-v)
|
152 |
|
|
where typeid is the type id of the encoded type of v and encoded-v is the gob
|
153 |
|
|
encoding of the value v.
|
154 |
|
|
|
155 |
|
|
To define a type, the encoder chooses an unused, positive type id and sends the
|
156 |
|
|
pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
|
157 |
|
|
description, constructed from these types:
|
158 |
|
|
|
159 |
|
|
type wireType struct {
|
160 |
|
|
ArrayT *ArrayType
|
161 |
|
|
SliceT *SliceType
|
162 |
|
|
StructT *StructType
|
163 |
|
|
MapT *MapType
|
164 |
|
|
}
|
165 |
|
|
type arrayType struct {
|
166 |
|
|
CommonType
|
167 |
|
|
Elem typeId
|
168 |
|
|
Len int
|
169 |
|
|
}
|
170 |
|
|
type CommonType struct {
|
171 |
|
|
Name string // the name of the struct type
|
172 |
|
|
Id int // the id of the type, repeated so it's inside the type
|
173 |
|
|
}
|
174 |
|
|
type sliceType struct {
|
175 |
|
|
CommonType
|
176 |
|
|
Elem typeId
|
177 |
|
|
}
|
178 |
|
|
type structType struct {
|
179 |
|
|
CommonType
|
180 |
|
|
Field []*fieldType // the fields of the struct.
|
181 |
|
|
}
|
182 |
|
|
type fieldType struct {
|
183 |
|
|
Name string // the name of the field.
|
184 |
|
|
Id int // the type id of the field, which must be already defined
|
185 |
|
|
}
|
186 |
|
|
type mapType struct {
|
187 |
|
|
CommonType
|
188 |
|
|
Key typeId
|
189 |
|
|
Elem typeId
|
190 |
|
|
}
|
191 |
|
|
|
192 |
|
|
If there are nested type ids, the types for all inner type ids must be defined
|
193 |
|
|
before the top-level type id is used to describe an encoded-v.
|
194 |
|
|
|
195 |
|
|
For simplicity in setup, the connection is defined to understand these types a
|
196 |
|
|
priori, as well as the basic gob types int, uint, etc. Their ids are:
|
197 |
|
|
|
198 |
|
|
bool 1
|
199 |
|
|
int 2
|
200 |
|
|
uint 3
|
201 |
|
|
float 4
|
202 |
|
|
[]byte 5
|
203 |
|
|
string 6
|
204 |
|
|
complex 7
|
205 |
|
|
interface 8
|
206 |
|
|
// gap for reserved ids.
|
207 |
|
|
WireType 16
|
208 |
|
|
ArrayType 17
|
209 |
|
|
CommonType 18
|
210 |
|
|
SliceType 19
|
211 |
|
|
StructType 20
|
212 |
|
|
FieldType 21
|
213 |
|
|
// 22 is slice of fieldType.
|
214 |
|
|
MapType 23
|
215 |
|
|
|
216 |
|
|
Finally, each message created by a call to Encode is preceded by an encoded
|
217 |
|
|
unsigned integer count of the number of bytes remaining in the message. After
|
218 |
|
|
the initial type name, interface values are wrapped the same way; in effect, the
|
219 |
|
|
interface value acts like a recursive invocation of Encode.
|
220 |
|
|
|
221 |
|
|
In summary, a gob stream looks like
|
222 |
|
|
|
223 |
|
|
(byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))*
|
224 |
|
|
|
225 |
|
|
where * signifies zero or more repetitions and the type id of a value must
|
226 |
|
|
be predefined or be defined before the value in the stream.
|
227 |
|
|
|
228 |
|
|
See "Gobs of data" for a design discussion of the gob wire format:
|
229 |
|
|
http://blog.golang.org/2011/03/gobs-of-data.html
|
230 |
|
|
*/
|
231 |
|
|
package gob
|
232 |
|
|
|
233 |
|
|
/*
|
234 |
|
|
Grammar:
|
235 |
|
|
|
236 |
|
|
Tokens starting with a lower case letter are terminals; int(n)
|
237 |
|
|
and uint(n) represent the signed/unsigned encodings of the value n.
|
238 |
|
|
|
239 |
|
|
GobStream:
|
240 |
|
|
DelimitedMessage*
|
241 |
|
|
DelimitedMessage:
|
242 |
|
|
uint(lengthOfMessage) Message
|
243 |
|
|
Message:
|
244 |
|
|
TypeSequence TypedValue
|
245 |
|
|
TypeSequence
|
246 |
|
|
(TypeDefinition DelimitedTypeDefinition*)?
|
247 |
|
|
DelimitedTypeDefinition:
|
248 |
|
|
uint(lengthOfTypeDefinition) TypeDefinition
|
249 |
|
|
TypedValue:
|
250 |
|
|
int(typeId) Value
|
251 |
|
|
TypeDefinition:
|
252 |
|
|
int(-typeId) encodingOfWireType
|
253 |
|
|
Value:
|
254 |
|
|
SingletonValue | StructValue
|
255 |
|
|
SingletonValue:
|
256 |
|
|
uint(0) FieldValue
|
257 |
|
|
FieldValue:
|
258 |
|
|
builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue
|
259 |
|
|
InterfaceValue:
|
260 |
|
|
NilInterfaceValue | NonNilInterfaceValue
|
261 |
|
|
NilInterfaceValue:
|
262 |
|
|
uint(0)
|
263 |
|
|
NonNilInterfaceValue:
|
264 |
|
|
ConcreteTypeName TypeSequence InterfaceContents
|
265 |
|
|
ConcreteTypeName:
|
266 |
|
|
uint(lengthOfName) [already read=n] name
|
267 |
|
|
InterfaceContents:
|
268 |
|
|
int(concreteTypeId) DelimitedValue
|
269 |
|
|
DelimitedValue:
|
270 |
|
|
uint(length) Value
|
271 |
|
|
ArrayValue:
|
272 |
|
|
uint(n) FieldValue*n [n elements]
|
273 |
|
|
MapValue:
|
274 |
|
|
uint(n) (FieldValue FieldValue)*n [n (key, value) pairs]
|
275 |
|
|
SliceValue:
|
276 |
|
|
uint(n) FieldValue*n [n elements]
|
277 |
|
|
StructValue:
|
278 |
|
|
(uint(fieldDelta) FieldValue)*
|
279 |
|
|
*/
|
280 |
|
|
|
281 |
|
|
/*
|
282 |
|
|
For implementers and the curious, here is an encoded example. Given
|
283 |
|
|
type Point struct {X, Y int}
|
284 |
|
|
and the value
|
285 |
|
|
p := Point{22, 33}
|
286 |
|
|
the bytes transmitted that encode p will be:
|
287 |
|
|
1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00
|
288 |
|
|
01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00
|
289 |
|
|
07 ff 82 01 2c 01 42 00
|
290 |
|
|
They are determined as follows.
|
291 |
|
|
|
292 |
|
|
Since this is the first transmission of type Point, the type descriptor
|
293 |
|
|
for Point itself must be sent before the value. This is the first type
|
294 |
|
|
we've sent on this Encoder, so it has type id 65 (0 through 64 are
|
295 |
|
|
reserved).
|
296 |
|
|
|
297 |
|
|
1f // This item (a type descriptor) is 31 bytes long.
|
298 |
|
|
ff 81 // The negative of the id for the type we're defining, -65.
|
299 |
|
|
// This is one byte (indicated by FF = -1) followed by
|
300 |
|
|
// ^-65<<1 | 1. The low 1 bit signals to complement the
|
301 |
|
|
// rest upon receipt.
|
302 |
|
|
|
303 |
|
|
// Now we send a type descriptor, which is itself a struct (wireType).
|
304 |
|
|
// The type of wireType itself is known (it's built in, as is the type of
|
305 |
|
|
// all its components), so we just need to send a *value* of type wireType
|
306 |
|
|
// that represents type "Point".
|
307 |
|
|
// Here starts the encoding of that value.
|
308 |
|
|
// Set the field number implicitly to -1; this is done at the beginning
|
309 |
|
|
// of every struct, including nested structs.
|
310 |
|
|
03 // Add 3 to field number; now 2 (wireType.structType; this is a struct).
|
311 |
|
|
// structType starts with an embedded CommonType, which appears
|
312 |
|
|
// as a regular structure here too.
|
313 |
|
|
01 // add 1 to field number (now 0); start of embedded CommonType.
|
314 |
|
|
01 // add 1 to field number (now 0, the name of the type)
|
315 |
|
|
05 // string is (unsigned) 5 bytes long
|
316 |
|
|
50 6f 69 6e 74 // wireType.structType.CommonType.name = "Point"
|
317 |
|
|
01 // add 1 to field number (now 1, the id of the type)
|
318 |
|
|
ff 82 // wireType.structType.CommonType._id = 65
|
319 |
|
|
00 // end of embedded wiretype.structType.CommonType struct
|
320 |
|
|
01 // add 1 to field number (now 1, the field array in wireType.structType)
|
321 |
|
|
02 // There are two fields in the type (len(structType.field))
|
322 |
|
|
01 // Start of first field structure; add 1 to get field number 0: field[0].name
|
323 |
|
|
01 // 1 byte
|
324 |
|
|
58 // structType.field[0].name = "X"
|
325 |
|
|
01 // Add 1 to get field number 1: field[0].id
|
326 |
|
|
04 // structType.field[0].typeId is 2 (signed int).
|
327 |
|
|
00 // End of structType.field[0]; start structType.field[1]; set field number to -1.
|
328 |
|
|
01 // Add 1 to get field number 0: field[1].name
|
329 |
|
|
01 // 1 byte
|
330 |
|
|
59 // structType.field[1].name = "Y"
|
331 |
|
|
01 // Add 1 to get field number 1: field[0].id
|
332 |
|
|
04 // struct.Type.field[1].typeId is 2 (signed int).
|
333 |
|
|
00 // End of structType.field[1]; end of structType.field.
|
334 |
|
|
00 // end of wireType.structType structure
|
335 |
|
|
00 // end of wireType structure
|
336 |
|
|
|
337 |
|
|
Now we can send the Point value. Again the field number resets to -1:
|
338 |
|
|
|
339 |
|
|
07 // this value is 7 bytes long
|
340 |
|
|
ff 82 // the type number, 65 (1 byte (-FF) followed by 65<<1)
|
341 |
|
|
01 // add one to field number, yielding field 0
|
342 |
|
|
2c // encoding of signed "22" (0x22 = 44 = 22<<1); Point.x = 22
|
343 |
|
|
01 // add one to field number, yielding field 1
|
344 |
|
|
42 // encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
|
345 |
|
|
00 // end of structure
|
346 |
|
|
|
347 |
|
|
The type encoding is long and fairly intricate but we send it only once.
|
348 |
|
|
If p is transmitted a second time, the type is already known so the
|
349 |
|
|
output will be just:
|
350 |
|
|
|
351 |
|
|
07 ff 82 01 2c 01 42 00
|
352 |
|
|
|
353 |
|
|
A single non-struct value at top level is transmitted like a field with
|
354 |
|
|
delta tag 0. For instance, a signed integer with value 3 presented as
|
355 |
|
|
the argument to Encode will emit:
|
356 |
|
|
|
357 |
|
|
03 04 00 06
|
358 |
|
|
|
359 |
|
|
Which represents:
|
360 |
|
|
|
361 |
|
|
03 // this value is 3 bytes long
|
362 |
|
|
04 // the type number, 2, represents an integer
|
363 |
|
|
00 // tag delta 0
|
364 |
|
|
06 // value 3
|
365 |
|
|
|
366 |
|
|
*/
|