1 |
62 |
marcus.erl |
Last update: 2005-01-17, version 1.4
|
2 |
|
|
|
3 |
|
|
This file is maintained by H. Peter Anvin as part
|
4 |
|
|
of the Linux Assigned Names And Numbers Authority (LANANA) project.
|
5 |
|
|
The current version can be found at:
|
6 |
|
|
|
7 |
|
|
http://www.lanana.org/docs/unicode/unicode.txt
|
8 |
|
|
|
9 |
|
|
------------------------
|
10 |
|
|
|
11 |
|
|
The Linux kernel code has been rewritten to use Unicode to map
|
12 |
|
|
characters to fonts. By downloading a single Unicode-to-font table,
|
13 |
|
|
both the eight-bit character sets and UTF-8 mode are changed to use
|
14 |
|
|
the font as indicated.
|
15 |
|
|
|
16 |
|
|
This changes the semantics of the eight-bit character tables subtly.
|
17 |
|
|
The four character tables are now:
|
18 |
|
|
|
19 |
|
|
Map symbol Map name Escape code (G0)
|
20 |
|
|
|
21 |
|
|
LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B
|
22 |
|
|
GRAF_MAP DEC VT100 pseudographics ESC ( 0
|
23 |
|
|
IBMPC_MAP IBM code page 437 ESC ( U
|
24 |
|
|
USER_MAP User defined ESC ( K
|
25 |
|
|
|
26 |
|
|
In particular, ESC ( U is no longer "straight to font", since the font
|
27 |
|
|
might be completely different than the IBM character set. This
|
28 |
|
|
permits for example the use of block graphics even with a Latin-1 font
|
29 |
|
|
loaded.
|
30 |
|
|
|
31 |
|
|
Note that although these codes are similar to ISO 2022, neither the
|
32 |
|
|
codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and
|
33 |
|
|
G1), whereas ISO 2022 has four 7-bit codes (G0-G3).
|
34 |
|
|
|
35 |
|
|
In accordance with the Unicode standard/ISO 10646 the range U+F000 to
|
36 |
|
|
U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
|
37 |
|
|
refers to this as a "Corporate Zone", since this is inaccurate for
|
38 |
|
|
Linux we call it the "Linux Zone"). U+F000 was picked as the starting
|
39 |
|
|
point since it lets the direct-mapping area start on a large power of
|
40 |
|
|
two (in case 1024- or 2048-character fonts ever become necessary).
|
41 |
|
|
This leaves U+E000 to U+EFFF as End User Zone.
|
42 |
|
|
|
43 |
|
|
[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been
|
44 |
|
|
hard-coded to map directly to the loaded font, bypassing the
|
45 |
|
|
translation table. The user-defined map now defaults to U+F000 to
|
46 |
|
|
U+F0FF, emulating the previous behaviour. In practice, this range
|
47 |
|
|
might be shorter; for example, vgacon can only handle 256-character
|
48 |
|
|
(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts.
|
49 |
|
|
|
50 |
|
|
|
51 |
|
|
Actual characters assigned in the Linux Zone
|
52 |
|
|
--------------------------------------------
|
53 |
|
|
|
54 |
|
|
In addition, the following characters not present in Unicode 1.1.4
|
55 |
|
|
have been defined; these are used by the DEC VT graphics map. [v1.2]
|
56 |
|
|
THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.
|
57 |
|
|
|
58 |
|
|
U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
|
59 |
|
|
U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
|
60 |
|
|
U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
|
61 |
|
|
U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
|
62 |
|
|
|
63 |
|
|
The DEC VT220 uses a 6x10 character matrix, and these characters form
|
64 |
|
|
a smooth progression in the DEC VT graphics character set. I have
|
65 |
|
|
omitted the scan 5 line, since it is also used as a block-graphics
|
66 |
|
|
character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
|
67 |
|
|
|
68 |
|
|
[v1.3]: These characters have been officially added to Unicode 3.2.0;
|
69 |
|
|
they are added at U+23BA, U+23BB, U+23BC, U+23BD. Linux now uses the
|
70 |
|
|
new values.
|
71 |
|
|
|
72 |
|
|
[v1.2]: The following characters have been added to represent common
|
73 |
|
|
keyboard symbols that are unlikely to ever be added to Unicode proper
|
74 |
|
|
since they are horribly vendor-specific. This, of course, is an
|
75 |
|
|
excellent example of horrible design.
|
76 |
|
|
|
77 |
|
|
U+F810 KEYBOARD SYMBOL FLYING FLAG
|
78 |
|
|
U+F811 KEYBOARD SYMBOL PULLDOWN MENU
|
79 |
|
|
U+F812 KEYBOARD SYMBOL OPEN APPLE
|
80 |
|
|
U+F813 KEYBOARD SYMBOL SOLID APPLE
|
81 |
|
|
|
82 |
|
|
Klingon language support
|
83 |
|
|
------------------------
|
84 |
|
|
|
85 |
|
|
In 1996, Linux was the first operating system in the world to add
|
86 |
|
|
support for the artificial language Klingon, created by Marc Okrand
|
87 |
|
|
for the "Star Trek" television series. This encoding was later
|
88 |
|
|
adopted by the ConScript Unicode Registry and proposed (but ultimately
|
89 |
|
|
rejected) for inclusion in Unicode Plane 1. Thus, it remains as a
|
90 |
|
|
Linux/CSUR private assignment in the Linux Zone.
|
91 |
|
|
|
92 |
|
|
This encoding has been endorsed by the Klingon Language Institute.
|
93 |
|
|
For more information, contact them at:
|
94 |
|
|
|
95 |
|
|
http://www.kli.org/
|
96 |
|
|
|
97 |
|
|
Since the characters in the beginning of the Linux CZ have been more
|
98 |
|
|
of the dingbats/symbols/forms type and this is a language, I have
|
99 |
|
|
located it at the end, on a 16-cell boundary in keeping with standard
|
100 |
|
|
Unicode practice.
|
101 |
|
|
|
102 |
|
|
NOTE: This range is now officially managed by the ConScript Unicode
|
103 |
|
|
Registry. The normative reference is at:
|
104 |
|
|
|
105 |
|
|
http://www.evertype.com/standards/csur/klingon.html
|
106 |
|
|
|
107 |
|
|
Klingon has an alphabet of 26 characters, a positional numeric writing
|
108 |
|
|
system with 10 digits, and is written left-to-right, top-to-bottom.
|
109 |
|
|
|
110 |
|
|
Several glyph forms for the Klingon alphabet have been proposed.
|
111 |
|
|
However, since the set of symbols appear to be consistent throughout,
|
112 |
|
|
with only the actual shapes being different, in keeping with standard
|
113 |
|
|
Unicode practice these differences are considered font variants.
|
114 |
|
|
|
115 |
|
|
U+F8D0 KLINGON LETTER A
|
116 |
|
|
U+F8D1 KLINGON LETTER B
|
117 |
|
|
U+F8D2 KLINGON LETTER CH
|
118 |
|
|
U+F8D3 KLINGON LETTER D
|
119 |
|
|
U+F8D4 KLINGON LETTER E
|
120 |
|
|
U+F8D5 KLINGON LETTER GH
|
121 |
|
|
U+F8D6 KLINGON LETTER H
|
122 |
|
|
U+F8D7 KLINGON LETTER I
|
123 |
|
|
U+F8D8 KLINGON LETTER J
|
124 |
|
|
U+F8D9 KLINGON LETTER L
|
125 |
|
|
U+F8DA KLINGON LETTER M
|
126 |
|
|
U+F8DB KLINGON LETTER N
|
127 |
|
|
U+F8DC KLINGON LETTER NG
|
128 |
|
|
U+F8DD KLINGON LETTER O
|
129 |
|
|
U+F8DE KLINGON LETTER P
|
130 |
|
|
U+F8DF KLINGON LETTER Q
|
131 |
|
|
- Written in standard Okrand Latin transliteration
|
132 |
|
|
U+F8E0 KLINGON LETTER QH
|
133 |
|
|
- Written in standard Okrand Latin transliteration
|
134 |
|
|
U+F8E1 KLINGON LETTER R
|
135 |
|
|
U+F8E2 KLINGON LETTER S
|
136 |
|
|
U+F8E3 KLINGON LETTER T
|
137 |
|
|
U+F8E4 KLINGON LETTER TLH
|
138 |
|
|
U+F8E5 KLINGON LETTER U
|
139 |
|
|
U+F8E6 KLINGON LETTER V
|
140 |
|
|
U+F8E7 KLINGON LETTER W
|
141 |
|
|
U+F8E8 KLINGON LETTER Y
|
142 |
|
|
U+F8E9 KLINGON LETTER GLOTTAL STOP
|
143 |
|
|
|
144 |
|
|
U+F8F0 KLINGON DIGIT ZERO
|
145 |
|
|
U+F8F1 KLINGON DIGIT ONE
|
146 |
|
|
U+F8F2 KLINGON DIGIT TWO
|
147 |
|
|
U+F8F3 KLINGON DIGIT THREE
|
148 |
|
|
U+F8F4 KLINGON DIGIT FOUR
|
149 |
|
|
U+F8F5 KLINGON DIGIT FIVE
|
150 |
|
|
U+F8F6 KLINGON DIGIT SIX
|
151 |
|
|
U+F8F7 KLINGON DIGIT SEVEN
|
152 |
|
|
U+F8F8 KLINGON DIGIT EIGHT
|
153 |
|
|
U+F8F9 KLINGON DIGIT NINE
|
154 |
|
|
|
155 |
|
|
U+F8FD KLINGON COMMA
|
156 |
|
|
U+F8FE KLINGON FULL STOP
|
157 |
|
|
U+F8FF KLINGON SYMBOL FOR EMPIRE
|
158 |
|
|
|
159 |
|
|
Other Fictional and Artificial Scripts
|
160 |
|
|
--------------------------------------
|
161 |
|
|
|
162 |
|
|
Since the assignment of the Klingon Linux Unicode block, a registry of
|
163 |
|
|
fictional and artificial scripts has been established by John Cowan
|
164 |
|
|
and Michael Everson .
|
165 |
|
|
The ConScript Unicode Registry is accessible at:
|
166 |
|
|
|
167 |
|
|
http://www.evertype.com/standards/csur/
|
168 |
|
|
|
169 |
|
|
The ranges used fall at the low end of the End User Zone and can hence
|
170 |
|
|
not be normatively assigned, but it is recommended that people who
|
171 |
|
|
wish to encode fictional scripts use these codes, in the interest of
|
172 |
|
|
interoperability. For Klingon, CSUR has adopted the Linux encoding.
|
173 |
|
|
The CSUR people are driving adding Tengwar and Cirth into Unicode
|
174 |
|
|
Plane 1; the addition of Klingon to Unicode Plane 1 has been rejected
|
175 |
|
|
and so the above encoding remains official.
|