OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [trunk/] [gnu-dev/] [or1k-gcc/] [libjava/] [classpath/] [gnu/] [xml/] [aelfred2/] [package.html] - Blame information for rev 769

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 769 jeremybenn
<!DOCTYPE html PUBLIC
2
        '-//W3C//DTD XHTML 1.0 Transitional//EN'
3
        'http://www.w3.org/TR/xhtml1/DTD/transitional.dtd'>
4
 
5
<html><head>
6
    <title>package overview</title>
7
<!--
8
/*
9
 * Copyright (C) 1999,2000,2001 The Free Software Foundation, Inc.
10
 */
11
-->
12
</head><body>
13
 
14
<p> This package contains &AElig;lfred2, which includes an
15
enhanced SAX2-compatible version of the &AElig;lfred
16
non-validating XML parser, a modular (and hence optional)
17
DTD validating parser, and modular (and hence optional)
18
JAXP glue to those.
19
Use these like any other SAX2 parsers. </p>
20
 
21
<ul>
22
    <li><a href="#about">About &AElig;lfred</a><ul>
23
        <li><a href="#principles">Design Principles</a></li>
24
        <li><a href="#name">About the Name &AElig;lfred</a></li>
25
        <li><a href="#encodings">Character Encodings</a></li>
26
        <li><a href="#violations">Known Conformance Violations</a></li>
27
        <li><a href="#copyright">Licensing</a></li>
28
        </ul></li>
29
 
30
    <li><a href="#changes">Changes Since the Last Microstar Release</a><ul>
31
        <li><a href="#sax2">SAX2 Support</a></li>
32
        <li><a href="#validation">Validation</a></li>
33
        <li><a href="#smaller">You Want Smaller?</a></li>
34
        <li><a href="#bugfixes">Bugs Fixed</a></li>
35
        </ul></li>
36
 
37
</ul>
38
 
39
<h2><a name="about">About &AElig;lfred</a></h2>
40
 
41
<p>&AElig;lfred is a XML parser written in the java programming language.
42
 
43
<h3><a name="principles">Design Principles</a></h3>
44
 
45
<p>In most Java applets and applications, XML should not be the central
46
feature; instead, XML is the means to another end, such as loading
47
configuration information, reading meta-data, or parsing transactions.</p>
48
 
49
<p> When an XML parser is only a single component of a much larger
50
program, it cannot be large, slow, or resource-intensive.  With Java
51
applets, in particular, code size is a significant issue.  The standard
52
modem is still not operating at 56 Kbaud, or sometimes even with data
53
compression.  Assuming an uncompressed 28.8 Kbaud modem, only about
54
3 KBytes can be downloaded in one second; compression often doubles
55
that speed, but a V.90 modem may not provide another doubling.  When
56
used with embedded processors, similar size concerns apply.  </p>
57
 
58
<p> &AElig;lfred is designed for easy and efficient use over the Internet,
59
based on the following principles: </p> <ol>
60
 
61
<li> &AElig;lfred must be as small as possible, so that it doesn't add too
62
   much to an applet's download time. </li>
63
 
64
<li> &AElig;lfred must use as few class files as possible, to minimize the
65
   number of HTTP connections necessary.  (The use of JAR files has made this
66
   be less of a concern.) </li>
67
 
68
<li> &AElig;lfred must be compatible with most or all Java implementations
69
   and platforms. (Write once, run anywhere.) </li>
70
 
71
<li> &AElig;lfred must use as little memory as possible, so that it does
72
   not take away resources from the rest of your program.  (It doesn't force
73
   you to use DOM or a similar costly data structure API.)</li>
74
 
75
<li> &AElig;lfred must run as fast as possible, so that it does not slow down
76
   the rest of your program. </li>
77
 
78
<li> &AElig;lfred must produce correct output for well-formed and valid
79
   documents, but need not reject every document that is not valid or
80
   not well-formed. (In &AElig;lfred2, correctness was a bigger concern
81
   than in the original version; and a validation option is available.) </li>
82
 
83
<li> &AElig;lfred must provide full internationalization from the first
84
    release.  (&AElig;lfred2 now automatically handles all encodings
85
    supported by the underlying JVM; previous versions handled only
86
    UTF-8, UTF_16, ASCII, and ISO-8859-1.)</li>
87
 
88
</ol>
89
 
90
<p>As you can see from this list, &AElig;lfred is designed for production
91
use, but neither validation nor perfect conformance was a requirement.
92
Good validating parsers exist, including one in this package,
93
and you should use them as appropriate.  (See conformance reviews
94
available at <a href="http://www.xml.com/">http://www.xml.com</a>)
95
</p>
96
 
97
<p> One of the main goals of &AElig;lfred2 was to significantly improve
98
conformance, while not significantly affecting the other goals stated above.
99
Since the only use of this parser is with SAX, some classes could be
100
removed, and so the overall size of &AElig;lfred was actually reduced.
101
Subsequent performance work produced a notable speedup (over twenty
102
percent on larger files).  That is, the tradeoffs between speed, size, and
103
conformance were re-targeted towards conformance and support of newer APIs
104
(SAX2), with a a positive performance impact. </p>
105
 
106
<p> The role anticipated for this version of &AElig;lfred is as a
107
lightweight Free Software SAX parser that can be used in essentially every
108
Java program where the handful of conformance violations (noted below)
109
are acceptable.
110
That certainly includes applets, and
111
nowadays one must also mention embedded systems as being even more
112
size-critical.
113
At this writing, all parsers that are more conformant are
114
significantly larger, even when counting the optional
115
validation support in this version of &AElig;lfred. </p>
116
 
117
 
118
<h3><a name="name">About the Name <em>&AElig;lfred</em></a></h3>
119
 
120
<p>&AElig;lfred the Great (AElfred in ASCII) was King of Wessex, and
121
some say of King of England, at the time of his death in 899 AD.
122
&AElig;lfred introduced a wide-spread literacy program in the hope that
123
his people would learn to read English, at least, if Latin was too
124
difficult for them.  This &AElig;lfred hopes to bring another sort of
125
literacy to Java, using XML, at least, if full SGML is too difficult.</p>
126
 
127
<p>The initial &AElig; ligature ("AE)" is also a reminder that XML is
128
not limited to ASCII.</p>
129
 
130
 
131
<h3><a name="encodings">Character Encodings</a></h3>
132
 
133
<p> The &AElig;lfred parser currently builds in support for a handful
134
of input encodings.  Of course these include UTF-8 and UTF-16, which
135
all XML parsers are required to support:</p> <ul>
136
 
137
    <li> UTF-8 ... the standard eight bit encoding, used unless
138
    you provide an encoding declaration or a MIME charset tag.</li>
139
 
140
    <li> US-ASCII ... an extremely common seven bit encoding,
141
    which happens to be a subset of UTF-8 and ISO-8859-1 as well
142
    as many other encodings.  XHTML web pages using US-ASCII
143
    (without an encoding declaration) are probably more
144
    widely interoperable than those in any other encoding. </li>
145
 
146
    <li> ISO-8859-1 ... includes accented characters used in
147
    much of western Europe (but excluding the Euro currency
148
    symbol).</li>
149
 
150
    <li> UTF-16 ... with several variants, this encodes each
151
    sixteen bit Unicode character in sixteen bits of output.
152
    Variants include UTF-16BE (big endian, no byte order mark),
153
    UTF-16LE (little endian, no byte order mark), and
154
    ISO-10646-UCS-2 (an older and less used encoding, using a
155
    version of Unicode without surrogate pairs).  This is
156
    essentially the native encoding used by Java.  </li>
157
 
158
    <li> ISO-10646-UCS-4 ... a seldom-used four byte encoding,
159
    also known as UTF-32BE.  Four byte order variants are supported,
160
    including one known as UTF-32LE.  Some operating systems
161
    standardized on UCS-4 despite its significant size penalty,
162
    in anticipation that Unicode (even with surrogate pairs)
163
    would eventually become limiting.  UCS-4 permits encoding
164
    of non-Unicode characters, which Java can't represent (and
165
    XML doesn't allow).
166
    </li>
167
 
168
    </ul>
169
 
170
<p> If you use any encoding other than UTF-8 or UTF-16 you should
171
make sure to label your data appropriately: </p>
172
 
173
<blockquote>
174
&lt;?xml version="1.0" encoding="<b>ISO-8859-15</b>"?&gt;
175
</blockquote>
176
 
177
<p> Encodings accessed through <code>java.io.InputStreamReader</code>
178
are now fully supported for both external labels (such as MIME types)
179
and internal types (as shown above).
180
There is one limitation in the support for internal labels:
181
the encodings must be derived from the US-ASCII encoding,
182
the EBCDIC family of encodings is not recognized.
183
Note that Java defines its
184
own encoding names, which don't always correspond to the standard
185
Internet encoding names defined by the IETF/IANA, and that Java
186
may even <em>require</em> use of nonstandard encoding names.
187
Please report
188
such problems; some of them can be worked around in this parser,
189
and many can be worked around by using external labels.
190
</p>
191
 
192
<p>Note that if you are using the Euro symbol with an fixed length
193
eight bit encoding, you should probably be using the encoding label
194
<em>iso-8859-15</em> or, with a Microsoft OS, <em>cp-1252</em>.
195
Of course, UTF-8 and UTF-16 handle the Euro symbol directly.
196
</p>
197
 
198
 
199
<h3><a name="violations">Known Conformance Violations</a></h3>
200
 
201
<p>Known conformance issues should be of negligible importance for
202
most applications, and include: </p><ul>
203
 
204
    <li> Rather than following the voluminous "Appendix B" rules about
205
    what characters may appear in names (and name tokens), the Unicode
206
    rules embedded in <em>java.lang.Character</em> are used.
207
    This means mostly that some names are inappropriately accepted,
208
    though a few are inappropriately rejected.  (It's much simpler
209
    to avoid that much special case code.  Recent OASIS/NIST test
210
    cases may have these rules be realistically testable.) </li>
211
 
212
    <li> Text containing "]]&gt;" is not rejected unless it fully resides
213
    in an internal buffer ... which is, thankfully, the typical case.  This
214
    text is illegal, but sometimes appears in illegal attempts to
215
    nest CDATA sections.  (Not catching that boundary condition
216
    substantially simplifies parsing text.) </li>
217
 
218
    <li> Surrogate characters that aren't correctly paired are ignored
219
    rather than rejected, unless they were encoded using UTF-8.  (This
220
    simplifies parsing text.)  Unicode 3.1 assigned the first characters
221
    to those character codes, in early 2001, so few documents (or tools)
222
    use such characters in any case. </li>
223
 
224
    <li> Declarations following references to an undefined parameter
225
    entity reference are not ignored. (Not maintaining and using state
226
    about this validity error simplifies declaration handling; few
227
    XML parsers address this constraint in any case.) </li>
228
 
229
    <li> Well formedness constraints for general entity references
230
    are not enforced.  (The code to handle the "content" production
231
    is merged with the element parsing code, making it hard to reuse
232
    for this additional situation.) </li>
233
 
234
</ul>
235
 
236
<p> When tested against the July 12, 1999 version of the OASIS
237
XML Conformance test suite, an earlier version passed 1057 of 1067 tests.
238
That contrasts with the original version, which passed 867.  The
239
current parser is top-ranked in terms of conformance, as is its
240
validating sibling (which has some additional conformance violations
241
imposed on it by SAX2 API deficiencies as well as some of the more
242
curious SGML layering artifacts found in the XML specification). </p>
243
 
244
<p> The XML 1.0 specification itself was not without problems,
245
and after some delays the W3C has come out with a revised
246
"second edition" specification.  While that doesn't resolve all
247
the problems identified the XML specification, many of the most
248
egregious problems have been resolved.  (You still need to drink
249
magic Kool-Aid before some DTD-related issues make sense.)
250
To the extent possible, this parser conforms to that second
251
edition specification, and does well against corrected versions
252
of the OASIS/NIST XML conformance test cases.  See <a href=
253
"http://xmlconf.sourceforge.net">http://xmlconf.sourceforge.net</a>
254
for more information about SAX2/XML conformance testing. </p>
255
 
256
 
257
<h3><a name="copyright">Copyright and distribution terms</a></h3>
258
 
259
<p>
260
The software in this package is distributed under the GNU General Public
261
License (with a special exception described below).
262
</p>
263
 
264
<p>
265
A copy of GNU General Public License (GPL) is included in this distribution,
266
in the file COPYING.  If you do not have the source code, it is available at:
267
 
268
    <a href="http://www.gnu.org/software/classpath/">http://www.gnu.org/software/classpath/</a>
269
</p>
270
 
271
<pre>
272
  Linking this library statically or dynamically with other modules is
273
  making a combined work based on this library.  Thus, the terms and
274
  conditions of the GNU General Public License cover the whole
275
  combination.
276
 
277
  As a special exception, the copyright holders of this library give you
278
  permission to link this library with independent modules to produce an
279
  executable, regardless of the license terms of these independent
280
  modules, and to copy and distribute the resulting executable under
281
  terms of your choice, provided that you also meet, for each linked
282
  independent module, the terms and conditions of the license of that
283
  module.  An independent module is a module which is not derived from
284
  or based on this library.  If you modify this library, you may extend
285
  this exception to your version of the library, but you are not
286
  obligated to do so.  If you do not wish to do so, delete this
287
  exception statement from your version.
288
 
289
  Parts derived from code which carried the following notice:
290
 
291
  Copyright (c) 1997, 1998 by Microstar Software Ltd.
292
 
293
  AElfred is free for both commercial and non-commercial use and
294
  redistribution, provided that Microstar's copyright and disclaimer are
295
  retained intact.  You are free to modify AElfred for your own use and
296
  to redistribute AElfred with your modifications, provided that the
297
  modifications are clearly documented.
298
 
299
  This program is distributed in the hope that it will be useful, but
300
  WITHOUT ANY WARRANTY; without even the implied warranty of
301
  merchantability or fitness for a particular purpose.  Please use it AT
302
  YOUR OWN RISK.
303
</pre>
304
 
305
<p> Some of this documentation was modified from the original
306
&AElig;lfred README.txt file.  All of it has been updated. </p>
307
 
308
</p>
309
 
310
 
311
<h2><a name="changes">Changes Since the last Microstar Release</a></h2>
312
 
313
<p> As noted above, Microstar has not updated this parser since
314
the summer of 1998, when it released version 1.2a on its web site.
315
This release is intended to benefit the developer community by
316
refocusing the API on SAX2, and improving conformance to the extent
317
that most developers should not need to use another XML parser.  </p>
318
 
319
<p> The code has been cleaned up (referring to the XML 1.0 spec in
320
all the production numbers in
321
comments, rather than some preliminary draft, for one example) and
322
has been sped up a bit as well.
323
JAXP support has been added, although developers are still
324
strongly encouraged to use the SAX2 APIs directly.  </p>
325
 
326
 
327
<h3><a name="sax2">SAX2 Support</a></h3>
328
 
329
<p> The original version of &AElig;lfred did not support the
330
SAX2 APIs. </p>
331
 
332
<p> This version supports the SAX2 APIs, exposing the standard
333
boolean feature descriptors.  It supports the "DeclHandler" property
334
to provide access to all DTD declarations not already exposed
335
through the SAX1 API.  The "LexicalHandler" property is supported,
336
exposing entity boundaries (including the unnamed external subset) and
337
things like comments and CDATA boundaries.  SAX1 compatibility is
338
currently provided.</p>
339
 
340
 
341
<h3><a name="validation">Validation</a></h3>
342
 
343
<p> In the 'pipeline' package in this same software distribution is an
344
<a href="../pipeline/ValidationConsumer.html">XML Validation component</a>
345
using any full SAX2 event stream (including all document type declarations)
346
to validate.  There is now a <a href="XmlReader.html">XmlReader</a> class
347
which combines that class and this enhanced &AElig;lfred parser, creating
348
an optionally validating SAX2 parser. </p>
349
 
350
<p> As noted in the documentation for that validating component, certain
351
validity constraints can't reliably be tested by a layered validator.
352
These include all constraints relying on
353
layering violations (exposing XML at the level of tokens or below,
354
required since XML isn't a context-free grammar), some that
355
SAX2 doesn't support, and a few others.  The resulting validating
356
parser is conformant enough for most applications that aren't doing
357
strange SGML tricks with DTDs.
358
Moreover, that validating filter can be used without
359
a parser ... any application component that emits SAX event streams
360
can DTD-validate its output on demand. </p>
361
 
362
<h3><a name="smaller">You want Smaller?</a></h3>
363
 
364
<p> You'll have noticed that the original version of &AElig;lfred
365
had small size as a top goal.  &AElig;lfred2 normally includes a
366
DTD validation layer, but you can package without that.
367
Similarly, JAXP factory support is available but optional.
368
Then the main added cost due to this revision are for
369
supporting the SAX2 API itself; DTD validation is as
370
cleanly layered as allowed by SAX2.</p>
371
 
372
<h3><a name="bugfixes">Bugs Fixed</a></h3>
373
 
374
<p> Bugs fixed in &AElig;lfred2 include: </p>
375
 
376
<ol>
377
    <li> Originally &AElig;lfred didn't close file descriptors, which
378
    led to file descriptor leakage on programs which ran for any
379
    length of time. </li>
380
 
381
    <li> NOTATION declarations without system identifiers are
382
    now handled correctly. </li>
383
 
384
    <li> DTD events are now reported for all invocations of a
385
    given parser, not just the first one. </li>
386
 
387
    <li> More correct character handling: <ul>
388
 
389
        <li> Rejects out-of-range characters, both in text and in
390
        character references. </li>
391
 
392
        <li> Correctly handles character references that expand to
393
        surrogate pairs. </li>
394
 
395
        <li> Correctly handles UTF-8 encodings of surrogate pairs. </li>
396
 
397
        <li> Correctly handles Unicode 3.1 rules about illegal UTF-8
398
        encodings: there is only one legal encoding per character. </li>
399
 
400
        <li> PUBLIC identifiers are now rejected if they have illegal
401
        characters. </li>
402
 
403
        <li> The parser is more correct about what characters are allowed
404
        in names and name tokens.  Uses Unicode rules (built in to Java)
405
        rather than the voluminous XML rules, although some extensions
406
        have been made to match XML rules more closely.</li>
407
 
408
        <li> Line ends are now normalized to newlines in all known
409
        cases. </li>
410
 
411
        </ul></li>
412
 
413
    <li> Certain validity errors were previously treated as well
414
    formedness violations. <ul>
415
 
416
        <li> Repeated declarations of an element type are no
417
        longer fatal errors. </li>
418
 
419
        <li> Undeclared parameter entity references are no longer
420
        fatal errors. </li>
421
 
422
        </ul></li>
423
 
424
    <li> Attribute handling is improved: <ul>
425
 
426
        <li> Whitespace must exist between attributes. </li>
427
 
428
        <li> Only one value for a given attribute is permitted. </li>
429
 
430
        <li> ATTLIST declarations don't need to declare attributes. </li>
431
 
432
        <li> Attribute values are normalized when required. </li>
433
 
434
        <li> Tabs in attribute values are normalized to spaces. </li>
435
 
436
        <li> Attribute values containing a literal "&lt;" are rejected. </li>
437
 
438
        </ul></li>
439
 
440
    <li> More correct entity handling: <ul>
441
 
442
        <li> Whitespace must precede NDATA when declaring unparsed
443
        entities.</li>
444
 
445
        <li> Parameter entity declarations may not have NDATA annotations. </li>
446
 
447
        <li> The XML specification has a bug in that it doesn't specify
448
        that certain contexts exist within which parameter entity
449
        expansion must not be performed.  Lacking an offical erratum,
450
        this parser now disables such expansion inside comments,
451
        processing instructions, ignored sections, public identifiers,
452
        and parts of entity declarations. </li>
453
 
454
        <li> Entity expansions that include quote characters no longer
455
        confuse parsing of strings using such expansions. </li>
456
 
457
        <li> Whitespace in the values of internal entities is not mapped
458
        to space characters. </li>
459
 
460
        <li> General Entity references in attribute defaults within the
461
        DTD now cause fatal errors when the entity is not defined at the
462
        time it is referenced. </li>
463
 
464
        <li> Malformed general entity references in entity declarations are
465
        now detected.  </li>
466
 
467
        </ul></li>
468
 
469
    <li> Neither conditional sections
470
    nor parameter entity references within markup declarations
471
    are permitted in the internal subset. </li>
472
 
473
    <li> Processing instructions whose target names are "XML"
474
    (ignoring case) are now rejected. </li>
475
 
476
    <li> Comments may not include "--".</li>
477
 
478
    <li> Most "]]&gt;" sequences in text are rejected. </li>
479
 
480
    <li> Correct syntax for standalone declarations is enforced. </li>
481
 
482
    <li> Setting a locale for diagnostics only produces an exception
483
    if the language of that locale isn't English. </li>
484
 
485
    <li> Some more encoding names are recognized.  These include the
486
    Unicode 3.0 variants of UTF-16 (UTF-16BE, UTF-16LE) as well as
487
    US-ASCII and a few commonly seen synonyms. </li>
488
 
489
    <li> Text (from character content, PIs, or comments) large enough
490
    not to fit into internal buffers is now handled correctly even in
491
    some cases which were originally handled incorrectly.</li>
492
 
493
    <li> Content is now reported for element types for which attributes
494
    have been declared, but no content model is known.  (Such documents
495
    are invalid, but may still be well formed.) </li>
496
 
497
</ol>
498
 
499
<p> Other bugs may also have been fixed. </p>
500
 
501
<p> For better overall validation support, some of the validity
502
constraints that can't be verified using the SAX2 event stream
503
are now reported directly by &AElig;lfred2. </p>
504
 
505
</body></html>
506
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.