The Linux kernel code has been rewritten to use Unicode to map
|
The Linux kernel code has been rewritten to use Unicode to map
|
characters to fonts. By downloading a single Unicode-to-font table,
|
characters to fonts. By downloading a single Unicode-to-font table,
|
both the eight-bit character sets and UTF-8 mode are changed to use
|
both the eight-bit character sets and UTF-8 mode are changed to use
|
the font as indicated.
|
the font as indicated.
|
|
|
This changes the semantics of the eight-bit character tables subtly.
|
This changes the semantics of the eight-bit character tables subtly.
|
The four character tables are now:
|
The four character tables are now:
|
|
|
Map symbol Map name Escape code (G0)
|
Map symbol Map name Escape code (G0)
|
|
|
LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B
|
LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B
|
GRAF_MAP DEC VT100 pseudographics ESC ( 0
|
GRAF_MAP DEC VT100 pseudographics ESC ( 0
|
IBMPC_MAP IBM code page 437 ESC ( U
|
IBMPC_MAP IBM code page 437 ESC ( U
|
USER_MAP User defined ESC ( K
|
USER_MAP User defined ESC ( K
|
|
|
In particular, ESC ( U is no longer "straight to font", since the font
|
In particular, ESC ( U is no longer "straight to font", since the font
|
might be completely different than the IBM character set. This
|
might be completely different than the IBM character set. This
|
permits for example the use of block graphics even with a Latin-1 font
|
permits for example the use of block graphics even with a Latin-1 font
|
loaded.
|
loaded.
|
|
|
In accordance with the Unicode standard/ISO 10646 the range U+F000 to
|
In accordance with the Unicode standard/ISO 10646 the range U+F000 to
|
U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
|
U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
|
refers to this as a "Corporate Zone", since this is inaccurate for
|
refers to this as a "Corporate Zone", since this is inaccurate for
|
Linux we call it the "Linux Zone"). U+F000 was picked as the starting
|
Linux we call it the "Linux Zone"). U+F000 was picked as the starting
|
point since it lets the direct-mapping area start on a large power of
|
point since it lets the direct-mapping area start on a large power of
|
two (in case 1024- or 2048-character fonts ever become necessary).
|
two (in case 1024- or 2048-character fonts ever become necessary).
|
This leaves U+E000 to U+EFFF as End User Zone.
|
This leaves U+E000 to U+EFFF as End User Zone.
|
|
|
The Unicodes in the range U+F000 to U+F1FF have been hard-coded to map
|
The Unicodes in the range U+F000 to U+F1FF have been hard-coded to map
|
directly to the loaded font, bypassing the translation table. The
|
directly to the loaded font, bypassing the translation table. The
|
user-defined map now defaults to U+F000 to U+F1FF, emulating the
|
user-defined map now defaults to U+F000 to U+F1FF, emulating the
|
previous behaviour. This range may expand in the future should it be
|
previous behaviour. This range may expand in the future should it be
|
warranted.
|
warranted.
|
|
|
Actual characters assigned in the Linux Zone
|
Actual characters assigned in the Linux Zone
|
--------------------------------------------
|
--------------------------------------------
|
|
|
In addition, the following characters not present in Unicode 1.1.4 (at
|
In addition, the following characters not present in Unicode 1.1.4 (at
|
least, I have not found them!) have been defined; these are used by
|
least, I have not found them!) have been defined; these are used by
|
the DEC VT graphics map:
|
the DEC VT graphics map:
|
|
|
U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
|
U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
|
U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
|
U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
|
U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
|
U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
|
U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
|
U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
|
|
|
The DEC VT220 uses a 6x10 character matrix, and these characters form
|
The DEC VT220 uses a 6x10 character matrix, and these characters form
|
a smooth progression in the DEC VT graphics character set. I have
|
a smooth progression in the DEC VT graphics character set. I have
|
omitted the scan 5 line, since it is also used as a block-graphics
|
omitted the scan 5 line, since it is also used as a block-graphics
|
character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
|
character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
|
However, I left U+F802 blank should the need arise.
|
However, I left U+F802 blank should the need arise.
|
|
|
Klingon language support
|
Klingon language support
|
------------------------
|
------------------------
|
|
|
Unfortunately, Unicode/ISO 10646 does not allocate code points for the
|
Unfortunately, Unicode/ISO 10646 does not allocate code points for the
|
language Klingon, probably fearing the potential code point explosion
|
language Klingon, probably fearing the potential code point explosion
|
if many fictional languages were submitted for inclusion. There are
|
if many fictional languages were submitted for inclusion. There are
|
also political reasons (the Japanese, for example, are not too happy
|
also political reasons (the Japanese, for example, are not too happy
|
about the whole 16-bit concept to begin with.) However, with Linux
|
about the whole 16-bit concept to begin with.) However, with Linux
|
being a hacker-driven OS it seems this is a brilliant linguistic hack
|
being a hacker-driven OS it seems this is a brilliant linguistic hack
|
worth supporting. Hence I have chosen to add it to the list in the
|
worth supporting. Hence I have chosen to add it to the list in the
|
Linux Zone.
|
Linux Zone.
|
|
|
Several glyph forms for the Klingon alphabet has been proposed.
|
Several glyph forms for the Klingon alphabet has been proposed.
|
However, since the set of symbols appear to be consistent throughout,
|
However, since the set of symbols appear to be consistent throughout,
|
with only the actual shapes being different, in keeping with standard
|
with only the actual shapes being different, in keeping with standard
|
Unicode practice these differences are considered font variants.
|
Unicode practice these differences are considered font variants.
|
|
|
Klingon has an alphabet of 26 characters, a positional numeric writing
|
Klingon has an alphabet of 26 characters, a positional numeric writing
|
system with 10 digits, and is written left-to-right, top-to-bottom.
|
system with 10 digits, and is written left-to-right, top-to-bottom.
|
Punctuation appears to be only used in Latin transliteration; it is
|
Punctuation appears to be only used in Latin transliteration; it is
|
appears customary to write each sentence on its own line, and
|
appears customary to write each sentence on its own line, and
|
centered. Space has been reserved for punctuation should it prove
|
centered. Space has been reserved for punctuation should it prove
|
necessary.
|
necessary.
|
|
|
This encoding has been endorsed by the Klingon Language Institute.
|
This encoding has been endorsed by the Klingon Language Institute.
|
For more information, contact them at:
|
For more information, contact them at:
|
|
|
http://www.kli.org/
|
http://www.kli.org/
|
|
|
Since the characters in the beginning of the Linux CZ have been more
|
Since the characters in the beginning of the Linux CZ have been more
|
of the dingbats/symbols/forms type and this is a language, I have
|
of the dingbats/symbols/forms type and this is a language, I have
|
located it at the end, on a 16-cell boundary in keeping with standard
|
located it at the end, on a 16-cell boundary in keeping with standard
|
Unicode practice.
|
Unicode practice.
|
|
|
U+F8D0 KLINGON LETTER A
|
U+F8D0 KLINGON LETTER A
|
U+F8D1 KLINGON LETTER B
|
U+F8D1 KLINGON LETTER B
|
U+F8D2 KLINGON LETTER CH
|
U+F8D2 KLINGON LETTER CH
|
U+F8D3 KLINGON LETTER D
|
U+F8D3 KLINGON LETTER D
|
U+F8D4 KLINGON LETTER E
|
U+F8D4 KLINGON LETTER E
|
U+F8D5 KLINGON LETTER GH
|
U+F8D5 KLINGON LETTER GH
|
U+F8D6 KLINGON LETTER H
|
U+F8D6 KLINGON LETTER H
|
U+F8D7 KLINGON LETTER I
|
U+F8D7 KLINGON LETTER I
|
U+F8D8 KLINGON LETTER J
|
U+F8D8 KLINGON LETTER J
|
U+F8D9 KLINGON LETTER L
|
U+F8D9 KLINGON LETTER L
|
U+F8DA KLINGON LETTER M
|
U+F8DA KLINGON LETTER M
|
U+F8DB KLINGON LETTER N
|
U+F8DB KLINGON LETTER N
|
U+F8DC KLINGON LETTER NG
|
U+F8DC KLINGON LETTER NG
|
U+F8DD KLINGON LETTER O
|
U+F8DD KLINGON LETTER O
|
U+F8DE KLINGON LETTER P
|
U+F8DE KLINGON LETTER P
|
U+F8DF KLINGON LETTER Q
|
U+F8DF KLINGON LETTER Q
|
- Written in standard Okrand Latin transliteration
|
- Written in standard Okrand Latin transliteration
|
U+F8E0 KLINGON LETTER QH
|
U+F8E0 KLINGON LETTER QH
|
- Written in standard Okrand Latin transliteration
|
- Written in standard Okrand Latin transliteration
|
U+F8E1 KLINGON LETTER R
|
U+F8E1 KLINGON LETTER R
|
U+F8E2 KLINGON LETTER S
|
U+F8E2 KLINGON LETTER S
|
U+F8E3 KLINGON LETTER T
|
U+F8E3 KLINGON LETTER T
|
U+F8E4 KLINGON LETTER TLH
|
U+F8E4 KLINGON LETTER TLH
|
U+F8E5 KLINGON LETTER U
|
U+F8E5 KLINGON LETTER U
|
U+F8E6 KLINGON LETTER V
|
U+F8E6 KLINGON LETTER V
|
U+F8E7 KLINGON LETTER W
|
U+F8E7 KLINGON LETTER W
|
U+F8E8 KLINGON LETTER Y
|
U+F8E8 KLINGON LETTER Y
|
U+F8E9 KLINGON LETTER GLOTTAL STOP
|
U+F8E9 KLINGON LETTER GLOTTAL STOP
|
|
|
U+F8F0 KLINGON DIGIT ZERO
|
U+F8F0 KLINGON DIGIT ZERO
|
U+F8F1 KLINGON DIGIT ONE
|
U+F8F1 KLINGON DIGIT ONE
|
U+F8F2 KLINGON DIGIT TWO
|
U+F8F2 KLINGON DIGIT TWO
|
U+F8F3 KLINGON DIGIT THREE
|
U+F8F3 KLINGON DIGIT THREE
|
U+F8F4 KLINGON DIGIT FOUR
|
U+F8F4 KLINGON DIGIT FOUR
|
U+F8F5 KLINGON DIGIT FIVE
|
U+F8F5 KLINGON DIGIT FIVE
|
U+F8F6 KLINGON DIGIT SIX
|
U+F8F6 KLINGON DIGIT SIX
|
U+F8F7 KLINGON DIGIT SEVEN
|
U+F8F7 KLINGON DIGIT SEVEN
|
U+F8F8 KLINGON DIGIT EIGHT
|
U+F8F8 KLINGON DIGIT EIGHT
|
U+F8F9 KLINGON DIGIT NINE
|
U+F8F9 KLINGON DIGIT NINE
|
|
|
Other Fictional and Artificial Scripts
|
Other Fictional and Artificial Scripts
|
--------------------------------------
|
--------------------------------------
|
|
|
Since the assignment of the Klingon Linux Unicode block, a registry of
|
Since the assignment of the Klingon Linux Unicode block, a registry of
|
fictional and artificial scripts has been established by John Cowan,
|
fictional and artificial scripts has been established by John Cowan,
|
. The ConScript Unicode Registry is accessible at
|
. The ConScript Unicode Registry is accessible at
|
http://locke.ccil.org/~cowan/csur/; the ranges used fall at the bottom
|
http://locke.ccil.org/~cowan/csur/; the ranges used fall at the bottom
|
of the End User Zone and can hence not be normatively assigned, but it
|
of the End User Zone and can hence not be normatively assigned, but it
|
is recommended that people who wish to encode fictional scripts use
|
is recommended that people who wish to encode fictional scripts use
|
these codes, in the interest of interoperability. For Klingon, CSUR
|
these codes, in the interest of interoperability. For Klingon, CSUR
|
has adopted the Linux encoding.
|
has adopted the Linux encoding.
|
|
|
H. Peter Anvin
|
H. Peter Anvin
|
|
|