| 1 | 
         769 | 
         jeremybenn | 
         <!DOCTYPE html PUBLIC
  | 
      
      
         | 2 | 
          | 
          | 
                 '-//W3C//DTD XHTML 1.0 Transitional//EN'
  | 
      
      
         | 3 | 
          | 
          | 
                 'http://www.w3.org/TR/xhtml1/DTD/transitional.dtd'>
  | 
      
      
         | 4 | 
          | 
          | 
          
  | 
      
      
         | 5 | 
          | 
          | 
         <html><head>
  | 
      
      
         | 6 | 
          | 
          | 
             <title>package overview</title>
  | 
      
      
         | 7 | 
          | 
          | 
         <!--
  | 
      
      
         | 8 | 
          | 
          | 
         /*
  | 
      
      
         | 9 | 
          | 
          | 
          * Copyright (C) 1999,2000,2001 The Free Software Foundation, Inc.
  | 
      
      
         | 10 | 
          | 
          | 
          */
  | 
      
      
         | 11 | 
          | 
          | 
         -->
  | 
      
      
         | 12 | 
          | 
          | 
         </head><body>
  | 
      
      
         | 13 | 
          | 
          | 
          
  | 
      
      
         | 14 | 
          | 
          | 
         <p> This package contains Ælfred2, which includes an
  | 
      
      
         | 15 | 
          | 
          | 
         enhanced SAX2-compatible version of the Ælfred
  | 
      
      
         | 16 | 
          | 
          | 
         non-validating XML parser, a modular (and hence optional)
  | 
      
      
         | 17 | 
          | 
          | 
         DTD validating parser, and modular (and hence optional)
  | 
      
      
         | 18 | 
          | 
          | 
         JAXP glue to those.
  | 
      
      
         | 19 | 
          | 
          | 
         Use these like any other SAX2 parsers. </p>
  | 
      
      
         | 20 | 
          | 
          | 
          
  | 
      
      
         | 21 | 
          | 
          | 
         <ul>
  | 
      
      
         | 22 | 
          | 
          | 
             <li><a href="#about">About Ælfred</a><ul>
  | 
      
      
         | 23 | 
          | 
          | 
                 <li><a href="#principles">Design Principles</a></li>
  | 
      
      
         | 24 | 
          | 
          | 
                 <li><a href="#name">About the Name Ælfred</a></li>
  | 
      
      
         | 25 | 
          | 
          | 
                 <li><a href="#encodings">Character Encodings</a></li>
  | 
      
      
         | 26 | 
          | 
          | 
                 <li><a href="#violations">Known Conformance Violations</a></li>
  | 
      
      
         | 27 | 
          | 
          | 
                 <li><a href="#copyright">Licensing</a></li>
  | 
      
      
         | 28 | 
          | 
          | 
                 </ul></li>
  | 
      
      
         | 29 | 
          | 
          | 
          
  | 
      
      
         | 30 | 
          | 
          | 
             <li><a href="#changes">Changes Since the Last Microstar Release</a><ul>
  | 
      
      
         | 31 | 
          | 
          | 
                 <li><a href="#sax2">SAX2 Support</a></li>
  | 
      
      
         | 32 | 
          | 
          | 
                 <li><a href="#validation">Validation</a></li>
  | 
      
      
         | 33 | 
          | 
          | 
                 <li><a href="#smaller">You Want Smaller?</a></li>
  | 
      
      
         | 34 | 
          | 
          | 
                 <li><a href="#bugfixes">Bugs Fixed</a></li>
  | 
      
      
         | 35 | 
          | 
          | 
                 </ul></li>
  | 
      
      
         | 36 | 
          | 
          | 
          
  | 
      
      
         | 37 | 
          | 
          | 
         </ul>
  | 
      
      
         | 38 | 
          | 
          | 
          
  | 
      
      
         | 39 | 
          | 
          | 
         <h2><a name="about">About Ælfred</a></h2>
  | 
      
      
         | 40 | 
          | 
          | 
          
  | 
      
      
         | 41 | 
          | 
          | 
         <p>Ælfred is a XML parser written in the java programming language.
  | 
      
      
         | 42 | 
          | 
          | 
          
  | 
      
      
         | 43 | 
          | 
          | 
         <h3><a name="principles">Design Principles</a></h3>
  | 
      
      
         | 44 | 
          | 
          | 
          
  | 
      
      
         | 45 | 
          | 
          | 
         <p>In most Java applets and applications, XML should not be the central
  | 
      
      
         | 46 | 
          | 
          | 
         feature; instead, XML is the means to another end, such as loading
  | 
      
      
         | 47 | 
          | 
          | 
         configuration information, reading meta-data, or parsing transactions.</p>
  | 
      
      
         | 48 | 
          | 
          | 
          
  | 
      
      
         | 49 | 
          | 
          | 
         <p> When an XML parser is only a single component of a much larger
  | 
      
      
         | 50 | 
          | 
          | 
         program, it cannot be large, slow, or resource-intensive.  With Java
  | 
      
      
         | 51 | 
          | 
          | 
         applets, in particular, code size is a significant issue.  The standard
  | 
      
      
         | 52 | 
          | 
          | 
         modem is still not operating at 56 Kbaud, or sometimes even with data
  | 
      
      
         | 53 | 
          | 
          | 
         compression.  Assuming an uncompressed 28.8 Kbaud modem, only about
  | 
      
      
         | 54 | 
          | 
          | 
         3 KBytes can be downloaded in one second; compression often doubles
  | 
      
      
         | 55 | 
          | 
          | 
         that speed, but a V.90 modem may not provide another doubling.  When
  | 
      
      
         | 56 | 
          | 
          | 
         used with embedded processors, similar size concerns apply.  </p>
  | 
      
      
         | 57 | 
          | 
          | 
          
  | 
      
      
         | 58 | 
          | 
          | 
         <p> Ælfred is designed for easy and efficient use over the Internet,
  | 
      
      
         | 59 | 
          | 
          | 
         based on the following principles: </p> <ol>
  | 
      
      
         | 60 | 
          | 
          | 
          
  | 
      
      
         | 61 | 
          | 
          | 
         <li> Ælfred must be as small as possible, so that it doesn't add too
  | 
      
      
         | 62 | 
          | 
          | 
            much to an applet's download time. </li>
  | 
      
      
         | 63 | 
          | 
          | 
          
  | 
      
      
         | 64 | 
          | 
          | 
         <li> Ælfred must use as few class files as possible, to minimize the
  | 
      
      
         | 65 | 
          | 
          | 
            number of HTTP connections necessary.  (The use of JAR files has made this
  | 
      
      
         | 66 | 
          | 
          | 
            be less of a concern.) </li>
  | 
      
      
         | 67 | 
          | 
          | 
          
  | 
      
      
         | 68 | 
          | 
          | 
         <li> Ælfred must be compatible with most or all Java implementations
  | 
      
      
         | 69 | 
          | 
          | 
            and platforms. (Write once, run anywhere.) </li>
  | 
      
      
         | 70 | 
          | 
          | 
          
  | 
      
      
         | 71 | 
          | 
          | 
         <li> Ælfred must use as little memory as possible, so that it does
  | 
      
      
         | 72 | 
          | 
          | 
            not take away resources from the rest of your program.  (It doesn't force
  | 
      
      
         | 73 | 
          | 
          | 
            you to use DOM or a similar costly data structure API.)</li>
  | 
      
      
         | 74 | 
          | 
          | 
          
  | 
      
      
         | 75 | 
          | 
          | 
         <li> Ælfred must run as fast as possible, so that it does not slow down
  | 
      
      
         | 76 | 
          | 
          | 
            the rest of your program. </li>
  | 
      
      
         | 77 | 
          | 
          | 
          
  | 
      
      
         | 78 | 
          | 
          | 
         <li> Ælfred must produce correct output for well-formed and valid
  | 
      
      
         | 79 | 
          | 
          | 
            documents, but need not reject every document that is not valid or
  | 
      
      
         | 80 | 
          | 
          | 
            not well-formed. (In Ælfred2, correctness was a bigger concern
  | 
      
      
         | 81 | 
          | 
          | 
            than in the original version; and a validation option is available.) </li>
  | 
      
      
         | 82 | 
          | 
          | 
          
  | 
      
      
         | 83 | 
          | 
          | 
         <li> Ælfred must provide full internationalization from the first
  | 
      
      
         | 84 | 
          | 
          | 
             release.  (Ælfred2 now automatically handles all encodings
  | 
      
      
         | 85 | 
          | 
          | 
             supported by the underlying JVM; previous versions handled only
  | 
      
      
         | 86 | 
          | 
          | 
             UTF-8, UTF_16, ASCII, and ISO-8859-1.)</li>
  | 
      
      
         | 87 | 
          | 
          | 
          
  | 
      
      
         | 88 | 
          | 
          | 
         </ol>
  | 
      
      
         | 89 | 
          | 
          | 
          
  | 
      
      
         | 90 | 
          | 
          | 
         <p>As you can see from this list, Ælfred is designed for production
  | 
      
      
         | 91 | 
          | 
          | 
         use, but neither validation nor perfect conformance was a requirement.
  | 
      
      
         | 92 | 
          | 
          | 
         Good validating parsers exist, including one in this package,
  | 
      
      
         | 93 | 
          | 
          | 
         and you should use them as appropriate.  (See conformance reviews
  | 
      
      
         | 94 | 
          | 
          | 
         available at <a href="http://www.xml.com/">http://www.xml.com</a>)
  | 
      
      
         | 95 | 
          | 
          | 
         </p>
  | 
      
      
         | 96 | 
          | 
          | 
          
  | 
      
      
         | 97 | 
          | 
          | 
         <p> One of the main goals of Ælfred2 was to significantly improve
  | 
      
      
         | 98 | 
          | 
          | 
         conformance, while not significantly affecting the other goals stated above.
  | 
      
      
         | 99 | 
          | 
          | 
         Since the only use of this parser is with SAX, some classes could be
  | 
      
      
         | 100 | 
          | 
          | 
         removed, and so the overall size of Ælfred was actually reduced.
  | 
      
      
         | 101 | 
          | 
          | 
         Subsequent performance work produced a notable speedup (over twenty
  | 
      
      
         | 102 | 
          | 
          | 
         percent on larger files).  That is, the tradeoffs between speed, size, and
  | 
      
      
         | 103 | 
          | 
          | 
         conformance were re-targeted towards conformance and support of newer APIs
  | 
      
      
         | 104 | 
          | 
          | 
         (SAX2), with a a positive performance impact. </p>
  | 
      
      
         | 105 | 
          | 
          | 
          
  | 
      
      
         | 106 | 
          | 
          | 
         <p> The role anticipated for this version of Ælfred is as a
  | 
      
      
         | 107 | 
          | 
          | 
         lightweight Free Software SAX parser that can be used in essentially every
  | 
      
      
         | 108 | 
          | 
          | 
         Java program where the handful of conformance violations (noted below)
  | 
      
      
         | 109 | 
          | 
          | 
         are acceptable.
  | 
      
      
         | 110 | 
          | 
          | 
         That certainly includes applets, and
  | 
      
      
         | 111 | 
          | 
          | 
         nowadays one must also mention embedded systems as being even more
  | 
      
      
         | 112 | 
          | 
          | 
         size-critical.
  | 
      
      
         | 113 | 
          | 
          | 
         At this writing, all parsers that are more conformant are
  | 
      
      
         | 114 | 
          | 
          | 
         significantly larger, even when counting the optional
  | 
      
      
         | 115 | 
          | 
          | 
         validation support in this version of Ælfred. </p>
  | 
      
      
         | 116 | 
          | 
          | 
          
  | 
      
      
         | 117 | 
          | 
          | 
          
  | 
      
      
         | 118 | 
          | 
          | 
         <h3><a name="name">About the Name <em>Ælfred</em></a></h3>
  | 
      
      
         | 119 | 
          | 
          | 
          
  | 
      
      
         | 120 | 
          | 
          | 
         <p>Ælfred the Great (AElfred in ASCII) was King of Wessex, and
  | 
      
      
         | 121 | 
          | 
          | 
         some say of King of England, at the time of his death in 899 AD.
  | 
      
      
         | 122 | 
          | 
          | 
         Ælfred introduced a wide-spread literacy program in the hope that
  | 
      
      
         | 123 | 
          | 
          | 
         his people would learn to read English, at least, if Latin was too
  | 
      
      
         | 124 | 
          | 
          | 
         difficult for them.  This Ælfred hopes to bring another sort of
  | 
      
      
         | 125 | 
          | 
          | 
         literacy to Java, using XML, at least, if full SGML is too difficult.</p>
  | 
      
      
         | 126 | 
          | 
          | 
          
  | 
      
      
         | 127 | 
          | 
          | 
         <p>The initial Æ ligature ("AE)" is also a reminder that XML is
  | 
      
      
         | 128 | 
          | 
          | 
         not limited to ASCII.</p>
  | 
      
      
         | 129 | 
          | 
          | 
          
  | 
      
      
         | 130 | 
          | 
          | 
          
  | 
      
      
         | 131 | 
          | 
          | 
         <h3><a name="encodings">Character Encodings</a></h3>
  | 
      
      
         | 132 | 
          | 
          | 
          
  | 
      
      
         | 133 | 
          | 
          | 
         <p> The Ælfred parser currently builds in support for a handful
  | 
      
      
         | 134 | 
          | 
          | 
         of input encodings.  Of course these include UTF-8 and UTF-16, which
  | 
      
      
         | 135 | 
          | 
          | 
         all XML parsers are required to support:</p> <ul>
  | 
      
      
         | 136 | 
          | 
          | 
          
  | 
      
      
         | 137 | 
          | 
          | 
             <li> UTF-8 ... the standard eight bit encoding, used unless
  | 
      
      
         | 138 | 
          | 
          | 
             you provide an encoding declaration or a MIME charset tag.</li>
  | 
      
      
         | 139 | 
          | 
          | 
          
  | 
      
      
         | 140 | 
          | 
          | 
             <li> US-ASCII ... an extremely common seven bit encoding,
  | 
      
      
         | 141 | 
          | 
          | 
             which happens to be a subset of UTF-8 and ISO-8859-1 as well
  | 
      
      
         | 142 | 
          | 
          | 
             as many other encodings.  XHTML web pages using US-ASCII
  | 
      
      
         | 143 | 
          | 
          | 
             (without an encoding declaration) are probably more
  | 
      
      
         | 144 | 
          | 
          | 
             widely interoperable than those in any other encoding. </li>
  | 
      
      
         | 145 | 
          | 
          | 
          
  | 
      
      
         | 146 | 
          | 
          | 
             <li> ISO-8859-1 ... includes accented characters used in
  | 
      
      
         | 147 | 
          | 
          | 
             much of western Europe (but excluding the Euro currency
  | 
      
      
         | 148 | 
          | 
          | 
             symbol).</li>
  | 
      
      
         | 149 | 
          | 
          | 
          
  | 
      
      
         | 150 | 
          | 
          | 
             <li> UTF-16 ... with several variants, this encodes each
  | 
      
      
         | 151 | 
          | 
          | 
             sixteen bit Unicode character in sixteen bits of output.
  | 
      
      
         | 152 | 
          | 
          | 
             Variants include UTF-16BE (big endian, no byte order mark),
  | 
      
      
         | 153 | 
          | 
          | 
             UTF-16LE (little endian, no byte order mark), and
  | 
      
      
         | 154 | 
          | 
          | 
             ISO-10646-UCS-2 (an older and less used encoding, using a
  | 
      
      
         | 155 | 
          | 
          | 
             version of Unicode without surrogate pairs).  This is
  | 
      
      
         | 156 | 
          | 
          | 
             essentially the native encoding used by Java.  </li>
  | 
      
      
         | 157 | 
          | 
          | 
          
  | 
      
      
         | 158 | 
          | 
          | 
             <li> ISO-10646-UCS-4 ... a seldom-used four byte encoding,
  | 
      
      
         | 159 | 
          | 
          | 
             also known as UTF-32BE.  Four byte order variants are supported,
  | 
      
      
         | 160 | 
          | 
          | 
             including one known as UTF-32LE.  Some operating systems
  | 
      
      
         | 161 | 
          | 
          | 
             standardized on UCS-4 despite its significant size penalty,
  | 
      
      
         | 162 | 
          | 
          | 
             in anticipation that Unicode (even with surrogate pairs)
  | 
      
      
         | 163 | 
          | 
          | 
             would eventually become limiting.  UCS-4 permits encoding
  | 
      
      
         | 164 | 
          | 
          | 
             of non-Unicode characters, which Java can't represent (and
  | 
      
      
         | 165 | 
          | 
          | 
             XML doesn't allow).
  | 
      
      
         | 166 | 
          | 
          | 
             </li>
  | 
      
      
         | 167 | 
          | 
          | 
          
  | 
      
      
         | 168 | 
          | 
          | 
             </ul>
  | 
      
      
         | 169 | 
          | 
          | 
          
  | 
      
      
         | 170 | 
          | 
          | 
         <p> If you use any encoding other than UTF-8 or UTF-16 you should
  | 
      
      
         | 171 | 
          | 
          | 
         make sure to label your data appropriately: </p>
  | 
      
      
         | 172 | 
          | 
          | 
          
  | 
      
      
         | 173 | 
          | 
          | 
         <blockquote>
  | 
      
      
         | 174 | 
          | 
          | 
         <?xml version="1.0" encoding="<b>ISO-8859-15</b>"?>
  | 
      
      
         | 175 | 
          | 
          | 
         </blockquote>
  | 
      
      
         | 176 | 
          | 
          | 
          
  | 
      
      
         | 177 | 
          | 
          | 
         <p> Encodings accessed through <code>java.io.InputStreamReader</code>
  | 
      
      
         | 178 | 
          | 
          | 
         are now fully supported for both external labels (such as MIME types)
  | 
      
      
         | 179 | 
          | 
          | 
         and internal types (as shown above).
  | 
      
      
         | 180 | 
          | 
          | 
         There is one limitation in the support for internal labels:
  | 
      
      
         | 181 | 
          | 
          | 
         the encodings must be derived from the US-ASCII encoding,
  | 
      
      
         | 182 | 
          | 
          | 
         the EBCDIC family of encodings is not recognized.
  | 
      
      
         | 183 | 
          | 
          | 
         Note that Java defines its
  | 
      
      
         | 184 | 
          | 
          | 
         own encoding names, which don't always correspond to the standard
  | 
      
      
         | 185 | 
          | 
          | 
         Internet encoding names defined by the IETF/IANA, and that Java
  | 
      
      
         | 186 | 
          | 
          | 
         may even <em>require</em> use of nonstandard encoding names.
  | 
      
      
         | 187 | 
          | 
          | 
         Please report
  | 
      
      
         | 188 | 
          | 
          | 
         such problems; some of them can be worked around in this parser,
  | 
      
      
         | 189 | 
          | 
          | 
         and many can be worked around by using external labels.
  | 
      
      
         | 190 | 
          | 
          | 
         </p>
  | 
      
      
         | 191 | 
          | 
          | 
          
  | 
      
      
         | 192 | 
          | 
          | 
         <p>Note that if you are using the Euro symbol with an fixed length
  | 
      
      
         | 193 | 
          | 
          | 
         eight bit encoding, you should probably be using the encoding label
  | 
      
      
         | 194 | 
          | 
          | 
         <em>iso-8859-15</em> or, with a Microsoft OS, <em>cp-1252</em>.
  | 
      
      
         | 195 | 
          | 
          | 
         Of course, UTF-8 and UTF-16 handle the Euro symbol directly.
  | 
      
      
         | 196 | 
          | 
          | 
         </p>
  | 
      
      
         | 197 | 
          | 
          | 
          
  | 
      
      
         | 198 | 
          | 
          | 
          
  | 
      
      
         | 199 | 
          | 
          | 
         <h3><a name="violations">Known Conformance Violations</a></h3>
  | 
      
      
         | 200 | 
          | 
          | 
          
  | 
      
      
         | 201 | 
          | 
          | 
         <p>Known conformance issues should be of negligible importance for
  | 
      
      
         | 202 | 
          | 
          | 
         most applications, and include: </p><ul>
  | 
      
      
         | 203 | 
          | 
          | 
          
  | 
      
      
         | 204 | 
          | 
          | 
             <li> Rather than following the voluminous "Appendix B" rules about
  | 
      
      
         | 205 | 
          | 
          | 
             what characters may appear in names (and name tokens), the Unicode
  | 
      
      
         | 206 | 
          | 
          | 
             rules embedded in <em>java.lang.Character</em> are used.
  | 
      
      
         | 207 | 
          | 
          | 
             This means mostly that some names are inappropriately accepted,
  | 
      
      
         | 208 | 
          | 
          | 
             though a few are inappropriately rejected.  (It's much simpler
  | 
      
      
         | 209 | 
          | 
          | 
             to avoid that much special case code.  Recent OASIS/NIST test
  | 
      
      
         | 210 | 
          | 
          | 
             cases may have these rules be realistically testable.) </li>
  | 
      
      
         | 211 | 
          | 
          | 
          
  | 
      
      
         | 212 | 
          | 
          | 
             <li> Text containing "]]>" is not rejected unless it fully resides
  | 
      
      
         | 213 | 
          | 
          | 
             in an internal buffer ... which is, thankfully, the typical case.  This
  | 
      
      
         | 214 | 
          | 
          | 
             text is illegal, but sometimes appears in illegal attempts to
  | 
      
      
         | 215 | 
          | 
          | 
             nest CDATA sections.  (Not catching that boundary condition
  | 
      
      
         | 216 | 
          | 
          | 
             substantially simplifies parsing text.) </li>
  | 
      
      
         | 217 | 
          | 
          | 
          
  | 
      
      
         | 218 | 
          | 
          | 
             <li> Surrogate characters that aren't correctly paired are ignored
  | 
      
      
         | 219 | 
          | 
          | 
             rather than rejected, unless they were encoded using UTF-8.  (This
  | 
      
      
         | 220 | 
          | 
          | 
             simplifies parsing text.)  Unicode 3.1 assigned the first characters
  | 
      
      
         | 221 | 
          | 
          | 
             to those character codes, in early 2001, so few documents (or tools)
  | 
      
      
         | 222 | 
          | 
          | 
             use such characters in any case. </li>
  | 
      
      
         | 223 | 
          | 
          | 
          
  | 
      
      
         | 224 | 
          | 
          | 
             <li> Declarations following references to an undefined parameter
  | 
      
      
         | 225 | 
          | 
          | 
             entity reference are not ignored. (Not maintaining and using state
  | 
      
      
         | 226 | 
          | 
          | 
             about this validity error simplifies declaration handling; few
  | 
      
      
         | 227 | 
          | 
          | 
             XML parsers address this constraint in any case.) </li>
  | 
      
      
         | 228 | 
          | 
          | 
          
  | 
      
      
         | 229 | 
          | 
          | 
             <li> Well formedness constraints for general entity references
  | 
      
      
         | 230 | 
          | 
          | 
             are not enforced.  (The code to handle the "content" production
  | 
      
      
         | 231 | 
          | 
          | 
             is merged with the element parsing code, making it hard to reuse
  | 
      
      
         | 232 | 
          | 
          | 
             for this additional situation.) </li>
  | 
      
      
         | 233 | 
          | 
          | 
          
  | 
      
      
         | 234 | 
          | 
          | 
         </ul>
  | 
      
      
         | 235 | 
          | 
          | 
          
  | 
      
      
         | 236 | 
          | 
          | 
         <p> When tested against the July 12, 1999 version of the OASIS
  | 
      
      
         | 237 | 
          | 
          | 
         XML Conformance test suite, an earlier version passed 1057 of 1067 tests.
  | 
      
      
         | 238 | 
          | 
          | 
         That contrasts with the original version, which passed 867.  The
  | 
      
      
         | 239 | 
          | 
          | 
         current parser is top-ranked in terms of conformance, as is its
  | 
      
      
         | 240 | 
          | 
          | 
         validating sibling (which has some additional conformance violations
  | 
      
      
         | 241 | 
          | 
          | 
         imposed on it by SAX2 API deficiencies as well as some of the more
  | 
      
      
         | 242 | 
          | 
          | 
         curious SGML layering artifacts found in the XML specification). </p>
  | 
      
      
         | 243 | 
          | 
          | 
          
  | 
      
      
         | 244 | 
          | 
          | 
         <p> The XML 1.0 specification itself was not without problems,
  | 
      
      
         | 245 | 
          | 
          | 
         and after some delays the W3C has come out with a revised
  | 
      
      
         | 246 | 
          | 
          | 
         "second edition" specification.  While that doesn't resolve all
  | 
      
      
         | 247 | 
          | 
          | 
         the problems identified the XML specification, many of the most
  | 
      
      
         | 248 | 
          | 
          | 
         egregious problems have been resolved.  (You still need to drink
  | 
      
      
         | 249 | 
          | 
          | 
         magic Kool-Aid before some DTD-related issues make sense.)
  | 
      
      
         | 250 | 
          | 
          | 
         To the extent possible, this parser conforms to that second
  | 
      
      
         | 251 | 
          | 
          | 
         edition specification, and does well against corrected versions
  | 
      
      
         | 252 | 
          | 
          | 
         of the OASIS/NIST XML conformance test cases.  See <a href=
  | 
      
      
         | 253 | 
          | 
          | 
         "http://xmlconf.sourceforge.net">http://xmlconf.sourceforge.net</a>
  | 
      
      
         | 254 | 
          | 
          | 
         for more information about SAX2/XML conformance testing. </p>
  | 
      
      
         | 255 | 
          | 
          | 
          
  | 
      
      
         | 256 | 
          | 
          | 
          
  | 
      
      
         | 257 | 
          | 
          | 
         <h3><a name="copyright">Copyright and distribution terms</a></h3>
  | 
      
      
         | 258 | 
          | 
          | 
          
  | 
      
      
         | 259 | 
          | 
          | 
         <p>
  | 
      
      
         | 260 | 
          | 
          | 
         The software in this package is distributed under the GNU General Public
  | 
      
      
         | 261 | 
          | 
          | 
         License (with a special exception described below).
  | 
      
      
         | 262 | 
          | 
          | 
         </p>
  | 
      
      
         | 263 | 
          | 
          | 
          
  | 
      
      
         | 264 | 
          | 
          | 
         <p>
  | 
      
      
         | 265 | 
          | 
          | 
         A copy of GNU General Public License (GPL) is included in this distribution,
  | 
      
      
         | 266 | 
          | 
          | 
         in the file COPYING.  If you do not have the source code, it is available at:
  | 
      
      
         | 267 | 
          | 
          | 
          
  | 
      
      
         | 268 | 
          | 
          | 
             <a href="http://www.gnu.org/software/classpath/">http://www.gnu.org/software/classpath/</a>
  | 
      
      
         | 269 | 
          | 
          | 
         </p>
  | 
      
      
         | 270 | 
          | 
          | 
          
  | 
      
      
         | 271 | 
          | 
          | 
         <pre>
  | 
      
      
         | 272 | 
          | 
          | 
           Linking this library statically or dynamically with other modules is
  | 
      
      
         | 273 | 
          | 
          | 
           making a combined work based on this library.  Thus, the terms and
  | 
      
      
         | 274 | 
          | 
          | 
           conditions of the GNU General Public License cover the whole
  | 
      
      
         | 275 | 
          | 
          | 
           combination.
  | 
      
      
         | 276 | 
          | 
          | 
          
  | 
      
      
         | 277 | 
          | 
          | 
           As a special exception, the copyright holders of this library give you
  | 
      
      
         | 278 | 
          | 
          | 
           permission to link this library with independent modules to produce an
  | 
      
      
         | 279 | 
          | 
          | 
           executable, regardless of the license terms of these independent
  | 
      
      
         | 280 | 
          | 
          | 
           modules, and to copy and distribute the resulting executable under
  | 
      
      
         | 281 | 
          | 
          | 
           terms of your choice, provided that you also meet, for each linked
  | 
      
      
         | 282 | 
          | 
          | 
           independent module, the terms and conditions of the license of that
  | 
      
      
         | 283 | 
          | 
          | 
           module.  An independent module is a module which is not derived from
  | 
      
      
         | 284 | 
          | 
          | 
           or based on this library.  If you modify this library, you may extend
  | 
      
      
         | 285 | 
          | 
          | 
           this exception to your version of the library, but you are not
  | 
      
      
         | 286 | 
          | 
          | 
           obligated to do so.  If you do not wish to do so, delete this
  | 
      
      
         | 287 | 
          | 
          | 
           exception statement from your version.
  | 
      
      
         | 288 | 
          | 
          | 
          
  | 
      
      
         | 289 | 
          | 
          | 
           Parts derived from code which carried the following notice:
  | 
      
      
         | 290 | 
          | 
          | 
          
  | 
      
      
         | 291 | 
          | 
          | 
           Copyright (c) 1997, 1998 by Microstar Software Ltd.
  | 
      
      
         | 292 | 
          | 
          | 
          
  | 
      
      
         | 293 | 
          | 
          | 
           AElfred is free for both commercial and non-commercial use and
  | 
      
      
         | 294 | 
          | 
          | 
           redistribution, provided that Microstar's copyright and disclaimer are
  | 
      
      
         | 295 | 
          | 
          | 
           retained intact.  You are free to modify AElfred for your own use and
  | 
      
      
         | 296 | 
          | 
          | 
           to redistribute AElfred with your modifications, provided that the
  | 
      
      
         | 297 | 
          | 
          | 
           modifications are clearly documented.
  | 
      
      
         | 298 | 
          | 
          | 
          
  | 
      
      
         | 299 | 
          | 
          | 
           This program is distributed in the hope that it will be useful, but
  | 
      
      
         | 300 | 
          | 
          | 
           WITHOUT ANY WARRANTY; without even the implied warranty of
  | 
      
      
         | 301 | 
          | 
          | 
           merchantability or fitness for a particular purpose.  Please use it AT
  | 
      
      
         | 302 | 
          | 
          | 
           YOUR OWN RISK.
  | 
      
      
         | 303 | 
          | 
          | 
         </pre>
  | 
      
      
         | 304 | 
          | 
          | 
          
  | 
      
      
         | 305 | 
          | 
          | 
         <p> Some of this documentation was modified from the original
  | 
      
      
         | 306 | 
          | 
          | 
         Ælfred README.txt file.  All of it has been updated. </p>
  | 
      
      
         | 307 | 
          | 
          | 
          
  | 
      
      
         | 308 | 
          | 
          | 
         </p>
  | 
      
      
         | 309 | 
          | 
          | 
          
  | 
      
      
         | 310 | 
          | 
          | 
          
  | 
      
      
         | 311 | 
          | 
          | 
         <h2><a name="changes">Changes Since the last Microstar Release</a></h2>
  | 
      
      
         | 312 | 
          | 
          | 
          
  | 
      
      
         | 313 | 
          | 
          | 
         <p> As noted above, Microstar has not updated this parser since
  | 
      
      
         | 314 | 
          | 
          | 
         the summer of 1998, when it released version 1.2a on its web site.
  | 
      
      
         | 315 | 
          | 
          | 
         This release is intended to benefit the developer community by
  | 
      
      
         | 316 | 
          | 
          | 
         refocusing the API on SAX2, and improving conformance to the extent
  | 
      
      
         | 317 | 
          | 
          | 
         that most developers should not need to use another XML parser.  </p>
  | 
      
      
         | 318 | 
          | 
          | 
          
  | 
      
      
         | 319 | 
          | 
          | 
         <p> The code has been cleaned up (referring to the XML 1.0 spec in
  | 
      
      
         | 320 | 
          | 
          | 
         all the production numbers in
  | 
      
      
         | 321 | 
          | 
          | 
         comments, rather than some preliminary draft, for one example) and
  | 
      
      
         | 322 | 
          | 
          | 
         has been sped up a bit as well.
  | 
      
      
         | 323 | 
          | 
          | 
         JAXP support has been added, although developers are still
  | 
      
      
         | 324 | 
          | 
          | 
         strongly encouraged to use the SAX2 APIs directly.  </p>
  | 
      
      
         | 325 | 
          | 
          | 
          
  | 
      
      
         | 326 | 
          | 
          | 
          
  | 
      
      
         | 327 | 
          | 
          | 
         <h3><a name="sax2">SAX2 Support</a></h3>
  | 
      
      
         | 328 | 
          | 
          | 
          
  | 
      
      
         | 329 | 
          | 
          | 
         <p> The original version of Ælfred did not support the
  | 
      
      
         | 330 | 
          | 
          | 
         SAX2 APIs. </p>
  | 
      
      
         | 331 | 
          | 
          | 
          
  | 
      
      
         | 332 | 
          | 
          | 
         <p> This version supports the SAX2 APIs, exposing the standard
  | 
      
      
         | 333 | 
          | 
          | 
         boolean feature descriptors.  It supports the "DeclHandler" property
  | 
      
      
         | 334 | 
          | 
          | 
         to provide access to all DTD declarations not already exposed
  | 
      
      
         | 335 | 
          | 
          | 
         through the SAX1 API.  The "LexicalHandler" property is supported,
  | 
      
      
         | 336 | 
          | 
          | 
         exposing entity boundaries (including the unnamed external subset) and
  | 
      
      
         | 337 | 
          | 
          | 
         things like comments and CDATA boundaries.  SAX1 compatibility is
  | 
      
      
         | 338 | 
          | 
          | 
         currently provided.</p>
  | 
      
      
         | 339 | 
          | 
          | 
          
  | 
      
      
         | 340 | 
          | 
          | 
          
  | 
      
      
         | 341 | 
          | 
          | 
         <h3><a name="validation">Validation</a></h3>
  | 
      
      
         | 342 | 
          | 
          | 
          
  | 
      
      
         | 343 | 
          | 
          | 
         <p> In the 'pipeline' package in this same software distribution is an
  | 
      
      
         | 344 | 
          | 
          | 
         <a href="../pipeline/ValidationConsumer.html">XML Validation component</a>
  | 
      
      
         | 345 | 
          | 
          | 
         using any full SAX2 event stream (including all document type declarations)
  | 
      
      
         | 346 | 
          | 
          | 
         to validate.  There is now a <a href="XmlReader.html">XmlReader</a> class
  | 
      
      
         | 347 | 
          | 
          | 
         which combines that class and this enhanced Ælfred parser, creating
  | 
      
      
         | 348 | 
          | 
          | 
         an optionally validating SAX2 parser. </p>
  | 
      
      
         | 349 | 
          | 
          | 
          
  | 
      
      
         | 350 | 
          | 
          | 
         <p> As noted in the documentation for that validating component, certain
  | 
      
      
         | 351 | 
          | 
          | 
         validity constraints can't reliably be tested by a layered validator.
  | 
      
      
         | 352 | 
          | 
          | 
         These include all constraints relying on
  | 
      
      
         | 353 | 
          | 
          | 
         layering violations (exposing XML at the level of tokens or below,
  | 
      
      
         | 354 | 
          | 
          | 
         required since XML isn't a context-free grammar), some that
  | 
      
      
         | 355 | 
          | 
          | 
         SAX2 doesn't support, and a few others.  The resulting validating
  | 
      
      
         | 356 | 
          | 
          | 
         parser is conformant enough for most applications that aren't doing
  | 
      
      
         | 357 | 
          | 
          | 
         strange SGML tricks with DTDs.
  | 
      
      
         | 358 | 
          | 
          | 
         Moreover, that validating filter can be used without
  | 
      
      
         | 359 | 
          | 
          | 
         a parser ... any application component that emits SAX event streams
  | 
      
      
         | 360 | 
          | 
          | 
         can DTD-validate its output on demand. </p>
  | 
      
      
         | 361 | 
          | 
          | 
          
  | 
      
      
         | 362 | 
          | 
          | 
         <h3><a name="smaller">You want Smaller?</a></h3>
  | 
      
      
         | 363 | 
          | 
          | 
          
  | 
      
      
         | 364 | 
          | 
          | 
         <p> You'll have noticed that the original version of Ælfred
  | 
      
      
         | 365 | 
          | 
          | 
         had small size as a top goal.  Ælfred2 normally includes a
  | 
      
      
         | 366 | 
          | 
          | 
         DTD validation layer, but you can package without that.
  | 
      
      
         | 367 | 
          | 
          | 
         Similarly, JAXP factory support is available but optional.
  | 
      
      
         | 368 | 
          | 
          | 
         Then the main added cost due to this revision are for
  | 
      
      
         | 369 | 
          | 
          | 
         supporting the SAX2 API itself; DTD validation is as
  | 
      
      
         | 370 | 
          | 
          | 
         cleanly layered as allowed by SAX2.</p>
  | 
      
      
         | 371 | 
          | 
          | 
          
  | 
      
      
         | 372 | 
          | 
          | 
         <h3><a name="bugfixes">Bugs Fixed</a></h3>
  | 
      
      
         | 373 | 
          | 
          | 
          
  | 
      
      
         | 374 | 
          | 
          | 
         <p> Bugs fixed in Ælfred2 include: </p>
  | 
      
      
         | 375 | 
          | 
          | 
          
  | 
      
      
         | 376 | 
          | 
          | 
         <ol>
  | 
      
      
         | 377 | 
          | 
          | 
             <li> Originally Ælfred didn't close file descriptors, which
  | 
      
      
         | 378 | 
          | 
          | 
             led to file descriptor leakage on programs which ran for any
  | 
      
      
         | 379 | 
          | 
          | 
             length of time. </li>
  | 
      
      
         | 380 | 
          | 
          | 
          
  | 
      
      
         | 381 | 
          | 
          | 
             <li> NOTATION declarations without system identifiers are
  | 
      
      
         | 382 | 
          | 
          | 
             now handled correctly. </li>
  | 
      
      
         | 383 | 
          | 
          | 
          
  | 
      
      
         | 384 | 
          | 
          | 
             <li> DTD events are now reported for all invocations of a
  | 
      
      
         | 385 | 
          | 
          | 
             given parser, not just the first one. </li>
  | 
      
      
         | 386 | 
          | 
          | 
          
  | 
      
      
         | 387 | 
          | 
          | 
             <li> More correct character handling: <ul>
  | 
      
      
         | 388 | 
          | 
          | 
          
  | 
      
      
         | 389 | 
          | 
          | 
                 <li> Rejects out-of-range characters, both in text and in
  | 
      
      
         | 390 | 
          | 
          | 
                 character references. </li>
  | 
      
      
         | 391 | 
          | 
          | 
          
  | 
      
      
         | 392 | 
          | 
          | 
                 <li> Correctly handles character references that expand to
  | 
      
      
         | 393 | 
          | 
          | 
                 surrogate pairs. </li>
  | 
      
      
         | 394 | 
          | 
          | 
          
  | 
      
      
         | 395 | 
          | 
          | 
                 <li> Correctly handles UTF-8 encodings of surrogate pairs. </li>
  | 
      
      
         | 396 | 
          | 
          | 
          
  | 
      
      
         | 397 | 
          | 
          | 
                 <li> Correctly handles Unicode 3.1 rules about illegal UTF-8
  | 
      
      
         | 398 | 
          | 
          | 
                 encodings: there is only one legal encoding per character. </li>
  | 
      
      
         | 399 | 
          | 
          | 
          
  | 
      
      
         | 400 | 
          | 
          | 
                 <li> PUBLIC identifiers are now rejected if they have illegal
  | 
      
      
         | 401 | 
          | 
          | 
                 characters. </li>
  | 
      
      
         | 402 | 
          | 
          | 
          
  | 
      
      
         | 403 | 
          | 
          | 
                 <li> The parser is more correct about what characters are allowed
  | 
      
      
         | 404 | 
          | 
          | 
                 in names and name tokens.  Uses Unicode rules (built in to Java)
  | 
      
      
         | 405 | 
          | 
          | 
                 rather than the voluminous XML rules, although some extensions
  | 
      
      
         | 406 | 
          | 
          | 
                 have been made to match XML rules more closely.</li>
  | 
      
      
         | 407 | 
          | 
          | 
          
  | 
      
      
         | 408 | 
          | 
          | 
                 <li> Line ends are now normalized to newlines in all known
  | 
      
      
         | 409 | 
          | 
          | 
                 cases. </li>
  | 
      
      
         | 410 | 
          | 
          | 
          
  | 
      
      
         | 411 | 
          | 
          | 
                 </ul></li>
  | 
      
      
         | 412 | 
          | 
          | 
          
  | 
      
      
         | 413 | 
          | 
          | 
             <li> Certain validity errors were previously treated as well
  | 
      
      
         | 414 | 
          | 
          | 
             formedness violations. <ul>
  | 
      
      
         | 415 | 
          | 
          | 
          
  | 
      
      
         | 416 | 
          | 
          | 
                 <li> Repeated declarations of an element type are no
  | 
      
      
         | 417 | 
          | 
          | 
                 longer fatal errors. </li>
  | 
      
      
         | 418 | 
          | 
          | 
          
  | 
      
      
         | 419 | 
          | 
          | 
                 <li> Undeclared parameter entity references are no longer
  | 
      
      
         | 420 | 
          | 
          | 
                 fatal errors. </li>
  | 
      
      
         | 421 | 
          | 
          | 
          
  | 
      
      
         | 422 | 
          | 
          | 
                 </ul></li>
  | 
      
      
         | 423 | 
          | 
          | 
          
  | 
      
      
         | 424 | 
          | 
          | 
             <li> Attribute handling is improved: <ul>
  | 
      
      
         | 425 | 
          | 
          | 
          
  | 
      
      
         | 426 | 
          | 
          | 
                 <li> Whitespace must exist between attributes. </li>
  | 
      
      
         | 427 | 
          | 
          | 
          
  | 
      
      
         | 428 | 
          | 
          | 
                 <li> Only one value for a given attribute is permitted. </li>
  | 
      
      
         | 429 | 
          | 
          | 
          
  | 
      
      
         | 430 | 
          | 
          | 
                 <li> ATTLIST declarations don't need to declare attributes. </li>
  | 
      
      
         | 431 | 
          | 
          | 
          
  | 
      
      
         | 432 | 
          | 
          | 
                 <li> Attribute values are normalized when required. </li>
  | 
      
      
         | 433 | 
          | 
          | 
          
  | 
      
      
         | 434 | 
          | 
          | 
                 <li> Tabs in attribute values are normalized to spaces. </li>
  | 
      
      
         | 435 | 
          | 
          | 
          
  | 
      
      
         | 436 | 
          | 
          | 
                 <li> Attribute values containing a literal "<" are rejected. </li>
  | 
      
      
         | 437 | 
          | 
          | 
          
  | 
      
      
         | 438 | 
          | 
          | 
                 </ul></li>
  | 
      
      
         | 439 | 
          | 
          | 
          
  | 
      
      
         | 440 | 
          | 
          | 
             <li> More correct entity handling: <ul>
  | 
      
      
         | 441 | 
          | 
          | 
          
  | 
      
      
         | 442 | 
          | 
          | 
                 <li> Whitespace must precede NDATA when declaring unparsed
  | 
      
      
         | 443 | 
          | 
          | 
                 entities.</li>
  | 
      
      
         | 444 | 
          | 
          | 
          
  | 
      
      
         | 445 | 
          | 
          | 
                 <li> Parameter entity declarations may not have NDATA annotations. </li>
  | 
      
      
         | 446 | 
          | 
          | 
          
  | 
      
      
         | 447 | 
          | 
          | 
                 <li> The XML specification has a bug in that it doesn't specify
  | 
      
      
         | 448 | 
          | 
          | 
                 that certain contexts exist within which parameter entity
  | 
      
      
         | 449 | 
          | 
          | 
                 expansion must not be performed.  Lacking an offical erratum,
  | 
      
      
         | 450 | 
          | 
          | 
                 this parser now disables such expansion inside comments,
  | 
      
      
         | 451 | 
          | 
          | 
                 processing instructions, ignored sections, public identifiers,
  | 
      
      
         | 452 | 
          | 
          | 
                 and parts of entity declarations. </li>
  | 
      
      
         | 453 | 
          | 
          | 
          
  | 
      
      
         | 454 | 
          | 
          | 
                 <li> Entity expansions that include quote characters no longer
  | 
      
      
         | 455 | 
          | 
          | 
                 confuse parsing of strings using such expansions. </li>
  | 
      
      
         | 456 | 
          | 
          | 
          
  | 
      
      
         | 457 | 
          | 
          | 
                 <li> Whitespace in the values of internal entities is not mapped
  | 
      
      
         | 458 | 
          | 
          | 
                 to space characters. </li>
  | 
      
      
         | 459 | 
          | 
          | 
          
  | 
      
      
         | 460 | 
          | 
          | 
                 <li> General Entity references in attribute defaults within the
  | 
      
      
         | 461 | 
          | 
          | 
                 DTD now cause fatal errors when the entity is not defined at the
  | 
      
      
         | 462 | 
          | 
          | 
                 time it is referenced. </li>
  | 
      
      
         | 463 | 
          | 
          | 
          
  | 
      
      
         | 464 | 
          | 
          | 
                 <li> Malformed general entity references in entity declarations are
  | 
      
      
         | 465 | 
          | 
          | 
                 now detected.  </li>
  | 
      
      
         | 466 | 
          | 
          | 
          
  | 
      
      
         | 467 | 
          | 
          | 
                 </ul></li>
  | 
      
      
         | 468 | 
          | 
          | 
          
  | 
      
      
         | 469 | 
          | 
          | 
             <li> Neither conditional sections
  | 
      
      
         | 470 | 
          | 
          | 
             nor parameter entity references within markup declarations
  | 
      
      
         | 471 | 
          | 
          | 
             are permitted in the internal subset. </li>
  | 
      
      
         | 472 | 
          | 
          | 
          
  | 
      
      
         | 473 | 
          | 
          | 
             <li> Processing instructions whose target names are "XML"
  | 
      
      
         | 474 | 
          | 
          | 
             (ignoring case) are now rejected. </li>
  | 
      
      
         | 475 | 
          | 
          | 
          
  | 
      
      
         | 476 | 
          | 
          | 
             <li> Comments may not include "--".</li>
  | 
      
      
         | 477 | 
          | 
          | 
          
  | 
      
      
         | 478 | 
          | 
          | 
             <li> Most "]]>" sequences in text are rejected. </li>
  | 
      
      
         | 479 | 
          | 
          | 
          
  | 
      
      
         | 480 | 
          | 
          | 
             <li> Correct syntax for standalone declarations is enforced. </li>
  | 
      
      
         | 481 | 
          | 
          | 
          
  | 
      
      
         | 482 | 
          | 
          | 
             <li> Setting a locale for diagnostics only produces an exception
  | 
      
      
         | 483 | 
          | 
          | 
             if the language of that locale isn't English. </li>
  | 
      
      
         | 484 | 
          | 
          | 
          
  | 
      
      
         | 485 | 
          | 
          | 
             <li> Some more encoding names are recognized.  These include the
  | 
      
      
         | 486 | 
          | 
          | 
             Unicode 3.0 variants of UTF-16 (UTF-16BE, UTF-16LE) as well as
  | 
      
      
         | 487 | 
          | 
          | 
             US-ASCII and a few commonly seen synonyms. </li>
  | 
      
      
         | 488 | 
          | 
          | 
          
  | 
      
      
         | 489 | 
          | 
          | 
             <li> Text (from character content, PIs, or comments) large enough
  | 
      
      
         | 490 | 
          | 
          | 
             not to fit into internal buffers is now handled correctly even in
  | 
      
      
         | 491 | 
          | 
          | 
             some cases which were originally handled incorrectly.</li>
  | 
      
      
         | 492 | 
          | 
          | 
          
  | 
      
      
         | 493 | 
          | 
          | 
             <li> Content is now reported for element types for which attributes
  | 
      
      
         | 494 | 
          | 
          | 
             have been declared, but no content model is known.  (Such documents
  | 
      
      
         | 495 | 
          | 
          | 
             are invalid, but may still be well formed.) </li>
  | 
      
      
         | 496 | 
          | 
          | 
          
  | 
      
      
         | 497 | 
          | 
          | 
         </ol>
  | 
      
      
         | 498 | 
          | 
          | 
          
  | 
      
      
         | 499 | 
          | 
          | 
         <p> Other bugs may also have been fixed. </p>
  | 
      
      
         | 500 | 
          | 
          | 
          
  | 
      
      
         | 501 | 
          | 
          | 
         <p> For better overall validation support, some of the validity
  | 
      
      
         | 502 | 
          | 
          | 
         constraints that can't be verified using the SAX2 event stream
  | 
      
      
         | 503 | 
          | 
          | 
         are now reported directly by Ælfred2. </p>
  | 
      
      
         | 504 | 
          | 
          | 
          
  | 
      
      
         | 505 | 
          | 
          | 
         </body></html>
  | 
      
      
         | 506 | 
          | 
          | 
          
  |