1 |
20 |
jlechner |
<?xml version="1.0" encoding="ISO-8859-1"?>
|
2 |
|
|
<!DOCTYPE html
|
3 |
|
|
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
4 |
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
5 |
|
|
|
6 |
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
7 |
|
|
<head>
|
8 |
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
|
9 |
|
|
<meta name="AUTHOR" content="bkoz@redhat.com (Benjamin Kosnik)" />
|
10 |
|
|
<meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL" />
|
11 |
|
|
<meta name="DESCRIPTION" content="Notes on the messages implementation." />
|
12 |
|
|
<title>Notes on the messages implementation.</title>
|
13 |
|
|
<link rel="StyleSheet" href="../lib3styles.css" type="text/css" />
|
14 |
|
|
<link rel="Start" href="../documentation.html" type="text/html"
|
15 |
|
|
title="GNU C++ Standard Library" />
|
16 |
|
|
<link rel="Bookmark" href="howto.html" type="text/html" title="Localization" />
|
17 |
|
|
<link rel="Copyright" href="../17_intro/license.html" type="text/html" />
|
18 |
|
|
<link rel="Help" href="../faq/index.html" type="text/html" title="F.A.Q." />
|
19 |
|
|
</head>
|
20 |
|
|
<body>
|
21 |
|
|
<h1>
|
22 |
|
|
Notes on the messages implementation.
|
23 |
|
|
</h1>
|
24 |
|
|
<em>
|
25 |
|
|
prepared by Benjamin Kosnik (bkoz@redhat.com) on August 8, 2001
|
26 |
|
|
</em>
|
27 |
|
|
|
28 |
|
|
<h2>
|
29 |
|
|
1. Abstract
|
30 |
|
|
</h2>
|
31 |
|
|
<p>
|
32 |
|
|
The std::messages facet implements message retrieval functionality
|
33 |
|
|
equivalent to Java's java.text.MessageFormat .using either GNU gettext
|
34 |
|
|
or IEEE 1003.1-200 functions.
|
35 |
|
|
</p>
|
36 |
|
|
|
37 |
|
|
<h2>
|
38 |
|
|
2. What the standard says
|
39 |
|
|
</h2>
|
40 |
|
|
The std::messages facet is probably the most vaguely defined facet in
|
41 |
|
|
the standard library. It's assumed that this facility was built into
|
42 |
|
|
the standard library in order to convert string literals from one
|
43 |
|
|
locale to the other. For instance, converting the "C" locale's
|
44 |
|
|
<code>const char* c = "please"</code> to a German-localized <code>"bitte"</code>
|
45 |
|
|
during program execution.
|
46 |
|
|
|
47 |
|
|
<blockquote>
|
48 |
|
|
22.2.7.1 - Template class messages [lib.locale.messages]
|
49 |
|
|
</blockquote>
|
50 |
|
|
|
51 |
|
|
This class has three public member functions, which directly
|
52 |
|
|
correspond to three protected virtual member functions.
|
53 |
|
|
|
54 |
|
|
The public member functions are:
|
55 |
|
|
|
56 |
|
|
<p>
|
57 |
|
|
<code>catalog open(const string&, const locale&) const</code>
|
58 |
|
|
</p>
|
59 |
|
|
|
60 |
|
|
<p>
|
61 |
|
|
<code>string_type get(catalog, int, int, const string_type&) const</code>
|
62 |
|
|
</p>
|
63 |
|
|
|
64 |
|
|
<p>
|
65 |
|
|
<code>void close(catalog) const</code>
|
66 |
|
|
</p>
|
67 |
|
|
|
68 |
|
|
<p>
|
69 |
|
|
While the virtual functions are:
|
70 |
|
|
</p>
|
71 |
|
|
|
72 |
|
|
<p>
|
73 |
|
|
<code>catalog do_open(const string&, const locale&) const</code>
|
74 |
|
|
</p>
|
75 |
|
|
<blockquote>
|
76 |
|
|
<em>
|
77 |
|
|
-1- Returns: A value that may be passed to get() to retrieve a
|
78 |
|
|
message, from the message catalog identified by the string name
|
79 |
|
|
according to an implementation-defined mapping. The result can be used
|
80 |
|
|
until it is passed to close(). Returns a value less than 0 if no such
|
81 |
|
|
catalog can be opened.
|
82 |
|
|
</em>
|
83 |
|
|
</blockquote>
|
84 |
|
|
|
85 |
|
|
<p>
|
86 |
|
|
<code>string_type do_get(catalog, int, int, const string_type&) const</code>
|
87 |
|
|
</p>
|
88 |
|
|
<blockquote>
|
89 |
|
|
<em>
|
90 |
|
|
-3- Requires: A catalog cat obtained from open() and not yet closed.
|
91 |
|
|
-4- Returns: A message identified by arguments set, msgid, and dfault,
|
92 |
|
|
according to an implementation-defined mapping. If no such message can
|
93 |
|
|
be found, returns dfault.
|
94 |
|
|
</em>
|
95 |
|
|
</blockquote>
|
96 |
|
|
|
97 |
|
|
<p>
|
98 |
|
|
<code>void do_close(catalog) const</code>
|
99 |
|
|
</p>
|
100 |
|
|
<blockquote>
|
101 |
|
|
<em>
|
102 |
|
|
-5- Requires: A catalog cat obtained from open() and not yet closed.
|
103 |
|
|
-6- Effects: Releases unspecified resources associated with cat.
|
104 |
|
|
-7- Notes: The limit on such resources, if any, is implementation-defined.
|
105 |
|
|
</em>
|
106 |
|
|
</blockquote>
|
107 |
|
|
|
108 |
|
|
|
109 |
|
|
<h2>
|
110 |
|
|
3. Problems with "C" messages: thread safety,
|
111 |
|
|
over-specification, and assumptions.
|
112 |
|
|
</h2>
|
113 |
|
|
A couple of notes on the standard.
|
114 |
|
|
|
115 |
|
|
<p>
|
116 |
|
|
First, why is <code>messages_base::catalog</code> specified as a typedef
|
117 |
|
|
to int? This makes sense for implementations that use
|
118 |
|
|
<code>catopen</code>, but not for others. Fortunately, it's not heavily
|
119 |
|
|
used and so only a minor irritant.
|
120 |
|
|
</p>
|
121 |
|
|
|
122 |
|
|
<p>
|
123 |
|
|
Second, by making the member functions <code>const</code>, it is
|
124 |
|
|
impossible to save state in them. Thus, storing away information used
|
125 |
|
|
in the 'open' member function for use in 'get' is impossible. This is
|
126 |
|
|
unfortunate.
|
127 |
|
|
</p>
|
128 |
|
|
|
129 |
|
|
<p>
|
130 |
|
|
The 'open' member function in particular seems to be oddly
|
131 |
|
|
designed. The signature seems quite peculiar. Why specify a <code>const
|
132 |
|
|
string& </code> argument, for instance, instead of just <code>const
|
133 |
|
|
char*</code>? Or, why specify a <code>const locale&</code> argument that is
|
134 |
|
|
to be used in the 'get' member function? How, exactly, is this locale
|
135 |
|
|
argument useful? What was the intent? It might make sense if a locale
|
136 |
|
|
argument was associated with a given default message string in the
|
137 |
|
|
'open' member function, for instance. Quite murky and unclear, on
|
138 |
|
|
reflection.
|
139 |
|
|
</p>
|
140 |
|
|
|
141 |
|
|
<p>
|
142 |
|
|
Lastly, it seems odd that messages, which explicitly require code
|
143 |
|
|
conversion, don't use the codecvt facet. Because the messages facet
|
144 |
|
|
has only one template parameter, it is assumed that ctype, and not
|
145 |
|
|
codecvt, is to be used to convert between character sets.
|
146 |
|
|
</p>
|
147 |
|
|
|
148 |
|
|
<p>
|
149 |
|
|
It is implicitly assumed that the locale for the default message
|
150 |
|
|
string in 'get' is in the "C" locale. Thus, all source code is assumed
|
151 |
|
|
to be written in English, so translations are always from "en_US" to
|
152 |
|
|
other, explicitly named locales.
|
153 |
|
|
</p>
|
154 |
|
|
|
155 |
|
|
<h2>
|
156 |
|
|
4. Design and Implementation Details
|
157 |
|
|
</h2>
|
158 |
|
|
This is a relatively simple class, on the face of it. The standard
|
159 |
|
|
specifies very little in concrete terms, so generic implementations
|
160 |
|
|
that are conforming yet do very little are the norm. Adding
|
161 |
|
|
functionality that would be useful to programmers and comparable to
|
162 |
|
|
Java's java.text.MessageFormat takes a bit of work, and is highly
|
163 |
|
|
dependent on the capabilities of the underlying operating system.
|
164 |
|
|
|
165 |
|
|
<p>
|
166 |
|
|
Three different mechanisms have been provided, selectable via
|
167 |
|
|
configure flags:
|
168 |
|
|
</p>
|
169 |
|
|
|
170 |
|
|
<ul>
|
171 |
|
|
<li> generic
|
172 |
|
|
<p>
|
173 |
|
|
This model does very little, and is what is used by default.
|
174 |
|
|
</p>
|
175 |
|
|
</li>
|
176 |
|
|
|
177 |
|
|
<li> gnu
|
178 |
|
|
<p>
|
179 |
|
|
The gnu model is complete and fully tested. It's based on the
|
180 |
|
|
GNU gettext package, which is part of glibc. It uses the functions
|
181 |
|
|
<code>textdomain, bindtextdomain, gettext</code>
|
182 |
|
|
to implement full functionality. Creating message
|
183 |
|
|
catalogs is a relatively straight-forward process and is
|
184 |
|
|
lightly documented below, and fully documented in gettext's
|
185 |
|
|
distributed documentation.
|
186 |
|
|
</p>
|
187 |
|
|
</li>
|
188 |
|
|
|
189 |
|
|
<li> ieee_1003.1-200x
|
190 |
|
|
<p>
|
191 |
|
|
This is a complete, though untested, implementation based on
|
192 |
|
|
the IEEE standard. The functions
|
193 |
|
|
<code>catopen, catgets, catclose</code>
|
194 |
|
|
are used to retrieve locale-specific messages given the
|
195 |
|
|
appropriate message catalogs that have been constructed for
|
196 |
|
|
their use. Note, the script <code> po2msg.sed</code> that is part
|
197 |
|
|
of the gettext distribution can convert gettext catalogs into
|
198 |
|
|
catalogs that <code>catopen</code> can use.
|
199 |
|
|
</p>
|
200 |
|
|
</li>
|
201 |
|
|
</ul>
|
202 |
|
|
|
203 |
|
|
<p>
|
204 |
|
|
A new, standards-conformant non-virtual member function signature was
|
205 |
|
|
added for 'open' so that a directory could be specified with a given
|
206 |
|
|
message catalog. This simplifies calling conventions for the gnu
|
207 |
|
|
model.
|
208 |
|
|
</p>
|
209 |
|
|
|
210 |
|
|
<p>
|
211 |
|
|
The rest of this document discusses details of the GNU model.
|
212 |
|
|
</p>
|
213 |
|
|
|
214 |
|
|
<p>
|
215 |
|
|
The messages facet, because it is retrieving and converting between
|
216 |
|
|
characters sets, depends on the ctype and perhaps the codecvt facet in
|
217 |
|
|
a given locale. In addition, underlying "C" library locale support is
|
218 |
|
|
necessary for more than just the <code>LC_MESSAGES</code> mask:
|
219 |
|
|
<code>LC_CTYPE</code> is also necessary. To avoid any unpleasantness, all
|
220 |
|
|
bits of the "C" mask (ie <code>LC_ALL</code>) are set before retrieving
|
221 |
|
|
messages.
|
222 |
|
|
</p>
|
223 |
|
|
|
224 |
|
|
<p>
|
225 |
|
|
Making the message catalogs can be initially tricky, but become quite
|
226 |
|
|
simple with practice. For complete info, see the gettext
|
227 |
|
|
documentation. Here's an idea of what is required:
|
228 |
|
|
</p>
|
229 |
|
|
|
230 |
|
|
<ul>
|
231 |
|
|
<li> Make a source file with the required string literals
|
232 |
|
|
that need to be translated. See
|
233 |
|
|
<code>intl/string_literals.cc</code> for an example.
|
234 |
|
|
</li>
|
235 |
|
|
|
236 |
|
|
<li> Make initial catalog (see "4 Making the PO Template File"
|
237 |
|
|
from the gettext docs).
|
238 |
|
|
<p>
|
239 |
|
|
<code> xgettext --c++ --debug string_literals.cc -o libstdc++.pot </code>
|
240 |
|
|
</p>
|
241 |
|
|
</li>
|
242 |
|
|
|
243 |
|
|
<li> Make language and country-specific locale catalogs.
|
244 |
|
|
<p>
|
245 |
|
|
<code>cp libstdc++.pot fr_FR.po</code>
|
246 |
|
|
</p>
|
247 |
|
|
<p>
|
248 |
|
|
<code>cp libstdc++.pot de_DE.po</code>
|
249 |
|
|
</p>
|
250 |
|
|
</li>
|
251 |
|
|
|
252 |
|
|
<li> Edit localized catalogs in emacs so that strings are
|
253 |
|
|
translated.
|
254 |
|
|
<p>
|
255 |
|
|
<code>emacs fr_FR.po</code>
|
256 |
|
|
</p>
|
257 |
|
|
</li>
|
258 |
|
|
|
259 |
|
|
<li> Make the binary mo files.
|
260 |
|
|
<p>
|
261 |
|
|
<code>msgfmt fr_FR.po -o fr_FR.mo</code>
|
262 |
|
|
</p>
|
263 |
|
|
<p>
|
264 |
|
|
<code>msgfmt de_DE.po -o de_DE.mo</code>
|
265 |
|
|
</p>
|
266 |
|
|
</li>
|
267 |
|
|
|
268 |
|
|
<li> Copy the binary files into the correct directory structure.
|
269 |
|
|
<p>
|
270 |
|
|
<code>cp fr_FR.mo (dir)/fr_FR/LC_MESSAGES/libstdc++-v3.mo</code>
|
271 |
|
|
</p>
|
272 |
|
|
<p>
|
273 |
|
|
<code>cp de_DE.mo (dir)/de_DE/LC_MESSAGES/libstdc++-v3.mo</code>
|
274 |
|
|
</p>
|
275 |
|
|
</li>
|
276 |
|
|
|
277 |
|
|
<li> Use the new message catalogs.
|
278 |
|
|
<p>
|
279 |
|
|
<code>locale loc_de("de_DE");</code>
|
280 |
|
|
</p>
|
281 |
|
|
<p>
|
282 |
|
|
<code>
|
283 |
|
|
use_facet<messages<char> >(loc_de).open("libstdc++", locale(), dir);
|
284 |
|
|
</code>
|
285 |
|
|
</p>
|
286 |
|
|
</li>
|
287 |
|
|
</ul>
|
288 |
|
|
|
289 |
|
|
<h2>
|
290 |
|
|
5. Examples
|
291 |
|
|
</h2>
|
292 |
|
|
|
293 |
|
|
<ul>
|
294 |
|
|
<li> message converting, simple example using the GNU model.
|
295 |
|
|
|
296 |
|
|
<pre>
|
297 |
|
|
#include <iostream>
|
298 |
|
|
#include <locale>
|
299 |
|
|
using namespace std;
|
300 |
|
|
|
301 |
|
|
void test01()
|
302 |
|
|
{
|
303 |
|
|
typedef messages<char>::catalog catalog;
|
304 |
|
|
const char* dir =
|
305 |
|
|
"/mnt/egcs/build/i686-pc-linux-gnu/libstdc++-v3/po/share/locale";
|
306 |
|
|
const locale loc_de("de_DE");
|
307 |
|
|
const messages<char>& mssg_de = use_facet<messages<char> >(loc_de);
|
308 |
|
|
|
309 |
|
|
catalog cat_de = mssg_de.open("libstdc++", loc_de, dir);
|
310 |
|
|
string s01 = mssg_de.get(cat_de, 0, 0, "please");
|
311 |
|
|
string s02 = mssg_de.get(cat_de, 0, 0, "thank you");
|
312 |
|
|
cout << "please in german:" << s01 << '\n';
|
313 |
|
|
cout << "thank you in german:" << s02 << '\n';
|
314 |
|
|
mssg_de.close(cat_de);
|
315 |
|
|
}
|
316 |
|
|
</pre>
|
317 |
|
|
</li>
|
318 |
|
|
</ul>
|
319 |
|
|
|
320 |
|
|
More information can be found in the following testcases:
|
321 |
|
|
<ul>
|
322 |
|
|
<li> testsuite/22_locale/messages.cc </li>
|
323 |
|
|
<li> testsuite/22_locale/messages_byname.cc </li>
|
324 |
|
|
<li> testsuite/22_locale/messages_char_members.cc </li>
|
325 |
|
|
</ul>
|
326 |
|
|
|
327 |
|
|
<h2>
|
328 |
|
|
6. Unresolved Issues
|
329 |
|
|
</h2>
|
330 |
|
|
<ul>
|
331 |
|
|
<li> Things that are sketchy, or remain unimplemented:
|
332 |
|
|
<ul>
|
333 |
|
|
<li>_M_convert_from_char, _M_convert_to_char are in
|
334 |
|
|
flux, depending on how the library ends up doing
|
335 |
|
|
character set conversions. It might not be possible to
|
336 |
|
|
do a real character set based conversion, due to the
|
337 |
|
|
fact that the template parameter for messages is not
|
338 |
|
|
enough to instantiate the codecvt facet (1 supplied,
|
339 |
|
|
need at least 2 but would prefer 3).
|
340 |
|
|
</li>
|
341 |
|
|
|
342 |
|
|
<li> There are issues with gettext needing the global
|
343 |
|
|
locale set to extract a message. This dependence on
|
344 |
|
|
the global locale makes the current "gnu" model non
|
345 |
|
|
MT-safe. Future versions of glibc, ie glibc 2.3.x will
|
346 |
|
|
fix this, and the C++ library bits are already in
|
347 |
|
|
place.
|
348 |
|
|
</li>
|
349 |
|
|
</ul>
|
350 |
|
|
</li>
|
351 |
|
|
|
352 |
|
|
<li> Development versions of the GNU "C" library, glibc 2.3 will allow
|
353 |
|
|
a more efficient, MT implementation of std::messages, and will
|
354 |
|
|
allow the removal of the _M_name_messages data member. If this
|
355 |
|
|
is done, it will change the library ABI. The C++ parts to
|
356 |
|
|
support glibc 2.3 have already been coded, but are not in use:
|
357 |
|
|
once this version of the "C" library is released, the marked
|
358 |
|
|
parts of the messages implementation can be switched over to
|
359 |
|
|
the new "C" library functionality.
|
360 |
|
|
</li>
|
361 |
|
|
<li> At some point in the near future, std::numpunct will probably use
|
362 |
|
|
std::messages facilities to implement truename/falename
|
363 |
|
|
correctly. This is currently not done, but entries in
|
364 |
|
|
libstdc++.pot have already been made for "true" and "false"
|
365 |
|
|
string literals, so all that remains is the std::numpunct
|
366 |
|
|
coding and the configure/make hassles to make the installed
|
367 |
|
|
library search its own catalog. Currently the libstdc++.mo
|
368 |
|
|
catalog is only searched for the testsuite cases involving
|
369 |
|
|
messages members.
|
370 |
|
|
</li>
|
371 |
|
|
|
372 |
|
|
<li> The following member functions:
|
373 |
|
|
|
374 |
|
|
<p>
|
375 |
|
|
<code>
|
376 |
|
|
catalog
|
377 |
|
|
open(const basic_string<char>& __s, const locale& __loc) const
|
378 |
|
|
</code>
|
379 |
|
|
</p>
|
380 |
|
|
|
381 |
|
|
<p>
|
382 |
|
|
<code>
|
383 |
|
|
catalog
|
384 |
|
|
open(const basic_string<char>&, const locale&, const char*) const;
|
385 |
|
|
</code>
|
386 |
|
|
</p>
|
387 |
|
|
|
388 |
|
|
<p>
|
389 |
|
|
Don't actually return a "value less than 0 if no such catalog
|
390 |
|
|
can be opened" as required by the standard in the "gnu"
|
391 |
|
|
model. As of this writing, it is unknown how to query to see
|
392 |
|
|
if a specified message catalog exists using the gettext
|
393 |
|
|
package.
|
394 |
|
|
</p>
|
395 |
|
|
</li>
|
396 |
|
|
</ul>
|
397 |
|
|
|
398 |
|
|
<h2>
|
399 |
|
|
7. Acknowledgments
|
400 |
|
|
</h2>
|
401 |
|
|
Ulrich Drepper for the character set explanations, gettext details,
|
402 |
|
|
and patient answering of late-night questions, Tom Tromey for the java details.
|
403 |
|
|
|
404 |
|
|
|
405 |
|
|
<h2>
|
406 |
|
|
8. Bibliography / Referenced Documents
|
407 |
|
|
</h2>
|
408 |
|
|
|
409 |
|
|
Drepper, Ulrich, GNU libc (glibc) 2.2 manual. In particular, Chapters
|
410 |
|
|
"7 Locales and Internationalization"
|
411 |
|
|
|
412 |
|
|
<p>
|
413 |
|
|
Drepper, Ulrich, Thread-Aware Locale Model, A proposal. This is a
|
414 |
|
|
draft document describing the design of glibc 2.3 MT locale
|
415 |
|
|
functionality.
|
416 |
|
|
</p>
|
417 |
|
|
|
418 |
|
|
<p>
|
419 |
|
|
Drepper, Ulrich, Numerous, late-night email correspondence
|
420 |
|
|
</p>
|
421 |
|
|
|
422 |
|
|
<p>
|
423 |
|
|
ISO/IEC 9899:1999 Programming languages - C
|
424 |
|
|
</p>
|
425 |
|
|
|
426 |
|
|
<p>
|
427 |
|
|
ISO/IEC 14882:1998 Programming languages - C++
|
428 |
|
|
</p>
|
429 |
|
|
|
430 |
|
|
<p>
|
431 |
|
|
Java 2 Platform, Standard Edition, v 1.3.1 API Specification. In
|
432 |
|
|
particular, java.util.Properties, java.text.MessageFormat,
|
433 |
|
|
java.util.Locale, java.util.ResourceBundle.
|
434 |
|
|
http://java.sun.com/j2se/1.3/docs/api
|
435 |
|
|
</p>
|
436 |
|
|
|
437 |
|
|
<p>
|
438 |
|
|
System Interface Definitions, Issue 7 (IEEE Std. 1003.1-200x)
|
439 |
|
|
The Open Group/The Institute of Electrical and Electronics Engineers, Inc.
|
440 |
|
|
In particular see lines 5268-5427.
|
441 |
|
|
http://www.opennc.org/austin/docreg.html
|
442 |
|
|
</p>
|
443 |
|
|
|
444 |
|
|
<p> GNU gettext tools, version 0.10.38, Native Language Support
|
445 |
|
|
Library and Tools.
|
446 |
|
|
http://sources.redhat.com/gettext
|
447 |
|
|
</p>
|
448 |
|
|
|
449 |
|
|
<p>
|
450 |
|
|
Langer, Angelika and Klaus Kreft, Standard C++ IOStreams and Locales,
|
451 |
|
|
Advanced Programmer's Guide and Reference, Addison Wesley Longman,
|
452 |
|
|
Inc. 2000. See page 725, Internationalized Messages.
|
453 |
|
|
</p>
|
454 |
|
|
|
455 |
|
|
<p>
|
456 |
|
|
Stroustrup, Bjarne, Appendix D, The C++ Programming Language, Special Edition, Addison Wesley, Inc. 2000
|
457 |
|
|
</p>
|
458 |
|
|
|
459 |
|
|
</body>
|
460 |
|
|
</html>
|
461 |
|
|
|