| 1 |
769 |
jeremybenn |
<html>
|
| 2 |
|
|
<body>
|
| 3 |
|
|
|
| 4 |
|
|
<p>
|
| 5 |
|
|
This is a Free Software DOM Level 3 implementation, supporting these features:
|
| 6 |
|
|
<ul>
|
| 7 |
|
|
<li>"XML"</li>
|
| 8 |
|
|
<li>"Events"</li>
|
| 9 |
|
|
<li>"MutationEvents"</li>
|
| 10 |
|
|
<li>"HTMLEvents" (won't generate them though)</li>
|
| 11 |
|
|
<li>"UIEvents" (also won't generate them)</li>
|
| 12 |
|
|
<li>"USER-Events" (a conformant extension)</li>
|
| 13 |
|
|
<li>"Traversal" (optional)</li>
|
| 14 |
|
|
<li>"XPath"</li>
|
| 15 |
|
|
<li>"LS" and "LS-Async"</li>
|
| 16 |
|
|
</ul>
|
| 17 |
|
|
It is intended to be a reasonable base both for
|
| 18 |
|
|
experimentation and supporting additional DOM modules as clean layers.
|
| 19 |
|
|
</p>
|
| 20 |
|
|
|
| 21 |
|
|
<p>
|
| 22 |
|
|
Note that while DOM does not specify its behavior in the
|
| 23 |
|
|
face of concurrent access, this implementation does.
|
| 24 |
|
|
Specifically:
|
| 25 |
|
|
<ul>
|
| 26 |
|
|
<li>If only one thread at a time accesses a Document,
|
| 27 |
|
|
of if several threads cooperate for read-only access,
|
| 28 |
|
|
then no concurrency conflicts will occur.</li>
|
| 29 |
|
|
<li>If several threads mutate a given document
|
| 30 |
|
|
(or send events using it) at the same time,
|
| 31 |
|
|
there is currently no guarantee that
|
| 32 |
|
|
they won't interfere with each other.</li>
|
| 33 |
|
|
</ul>
|
| 34 |
|
|
</p>
|
| 35 |
|
|
|
| 36 |
|
|
<h3>Design Goals</h3>
|
| 37 |
|
|
|
| 38 |
|
|
<p>
|
| 39 |
|
|
A number of DOM implementations are available in Java, including
|
| 40 |
|
|
commercial ones from Sun, IBM, Oracle, and DataChannel as well as
|
| 41 |
|
|
noncommercial ones from Docuverse, OpenXML, and Silfide. Why have
|
| 42 |
|
|
another? Some of the goals of this version:
|
| 43 |
|
|
</p>
|
| 44 |
|
|
|
| 45 |
|
|
<ul>
|
| 46 |
|
|
<li>Advanced DOM support. This was the first generally available
|
| 47 |
|
|
implementation of DOM Level 2 in Java, and one of the first Level 3
|
| 48 |
|
|
and XPath implementations.</li>
|
| 49 |
|
|
|
| 50 |
|
|
<li> Free Software. This one is distributed under the GPL (with
|
| 51 |
|
|
"library exception") so it can be used with a different class of
|
| 52 |
|
|
application.</li>
|
| 53 |
|
|
|
| 54 |
|
|
<li>Second implementation syndrome. I can do it simpler this time
|
| 55 |
|
|
around ... and heck, writing it only takes a bit over a day once you
|
| 56 |
|
|
know your way around.</li>
|
| 57 |
|
|
|
| 58 |
|
|
<li>Sanity check the then-current Last Call DOM draft. Best to find
|
| 59 |
|
|
bugs early, when they're relatively fixable. Yes, bugs were found.</li>
|
| 60 |
|
|
|
| 61 |
|
|
<li>Modularity. Most of the implementations mentioned above are part
|
| 62 |
|
|
of huge packages; take all (including bugs, of which some have far
|
| 63 |
|
|
too many), or take nothing. I prefer a menu approach, when possible.
|
| 64 |
|
|
This code is standalone, not beholden to any particular parser or XSL
|
| 65 |
|
|
or XPath code.</li>
|
| 66 |
|
|
|
| 67 |
|
|
<li>OK, I'm a hacker, I like to write code.</li>
|
| 68 |
|
|
</ul>
|
| 69 |
|
|
|
| 70 |
|
|
<p>
|
| 71 |
|
|
This also works with the GNU Compiler for Java (GCJ). GCJ promises
|
| 72 |
|
|
to be quite the environment for programming Java, both directly and from
|
| 73 |
|
|
C++ using the new CNI interfaces (which really use C++, unlike JNI). </p>
|
| 74 |
|
|
|
| 75 |
|
|
|
| 76 |
|
|
<h3>Open Issues</h3>
|
| 77 |
|
|
|
| 78 |
|
|
<p>At this writing:</p>
|
| 79 |
|
|
<ul>
|
| 80 |
|
|
<li>See below for some restrictions on the mutation event
|
| 81 |
|
|
support ... some events aren't reported (and likely won't be).</li>
|
| 82 |
|
|
|
| 83 |
|
|
<li>More testing and conformance work is needed.</li>
|
| 84 |
|
|
|
| 85 |
|
|
<li>We need an XML Schema validator (actually we need validation in the DOM
|
| 86 |
|
|
full stop).</li>
|
| 87 |
|
|
</ul>
|
| 88 |
|
|
|
| 89 |
|
|
<p>
|
| 90 |
|
|
I ran a profiler a few times and remove some of the performance hotspots,
|
| 91 |
|
|
but it's not tuned. Reporting mutation events, in particular, is
|
| 92 |
|
|
rather costly -- it started at about a 40% penalty for appendNode calls,
|
| 93 |
|
|
I've got it down around 12%, but it'll be hard to shrink it much further.
|
| 94 |
|
|
The overall code size is relatively small, though you may want to be rid of
|
| 95 |
|
|
many of the unused DOM interface classes (HTML, CSS, and so on).
|
| 96 |
|
|
</p>
|
| 97 |
|
|
|
| 98 |
|
|
|
| 99 |
|
|
<h2><a name="features">Features of this Package</a></h2>
|
| 100 |
|
|
|
| 101 |
|
|
<p> Starting with DOM Level 2, you can really see that DOM is constructed
|
| 102 |
|
|
as a bunch of optional modules around a core of either XML or HTML
|
| 103 |
|
|
functionality. Different implementations will support different optional
|
| 104 |
|
|
modules. This implementation provides a set of features that should be
|
| 105 |
|
|
useful if you're not depending on the HTML functionality (lots of convenience
|
| 106 |
|
|
functions that mostly don't buy much except API surface area) and user
|
| 107 |
|
|
interface support. That is, browsers will want more -- but what they
|
| 108 |
|
|
need should be cleanly layered over what's already here. </p>
|
| 109 |
|
|
|
| 110 |
|
|
<h3> Core Feature Set: "XML" </h3>
|
| 111 |
|
|
|
| 112 |
|
|
<p> This DOM implementation supports the "XML" feature set, which basically
|
| 113 |
|
|
gets you four things over the bare core (which you're officially not supposed
|
| 114 |
|
|
to implement except in conjunction with the "XML" or "HTML" feature). In
|
| 115 |
|
|
order of decreasing utility, those four things are: </p> <ol>
|
| 116 |
|
|
|
| 117 |
|
|
<li> ProcessingInstruction nodes. These are probably the most
|
| 118 |
|
|
valuable thing. Handy little buggers, in part because all the APIs
|
| 119 |
|
|
you need to use them are provided, and they're designed to let you
|
| 120 |
|
|
escape XML document structure rules in controlled ways.</li>
|
| 121 |
|
|
|
| 122 |
|
|
<li> CDATASection nodes. These are of of limited utility since CDATA
|
| 123 |
|
|
is just text that prints funny. These are of use to some sorts of
|
| 124 |
|
|
applications, though I encourage folk to not use them. </li>
|
| 125 |
|
|
|
| 126 |
|
|
<li> DocumentType nodes, and associated Notation and Entity nodes.
|
| 127 |
|
|
These appear to be useless. Briefly, these "Type" nodes expose no
|
| 128 |
|
|
typing information. They're only really usable to expose some lexical
|
| 129 |
|
|
structure that almost every application needs to ignore. (XML editors
|
| 130 |
|
|
might like to see them, but they need true typing information much more.)
|
| 131 |
|
|
I strongly encourage people not to use these. </li>
|
| 132 |
|
|
|
| 133 |
|
|
<li> EntityReference nodes can show up. These are actively annoying,
|
| 134 |
|
|
since they add an extra level of hierarchy, are the cause of most of
|
| 135 |
|
|
the complexity in attribute values, and their contents are immutable.
|
| 136 |
|
|
Avoid these.</li>
|
| 137 |
|
|
|
| 138 |
|
|
</ol>
|
| 139 |
|
|
|
| 140 |
|
|
<h3> Optional Feature Sets: "Events", and friends </h3>
|
| 141 |
|
|
|
| 142 |
|
|
<p> Events may be one of the more interesting new features in Level 2.
|
| 143 |
|
|
This package provides the core feature set and exposes mutation events.
|
| 144 |
|
|
No gooey events though; if you want that, write a layered implementation! </p>
|
| 145 |
|
|
|
| 146 |
|
|
<p> Three mutation events aren't currently generated:</p> <ul>
|
| 147 |
|
|
|
| 148 |
|
|
<li> <em>DOMSubtreeModified</em> is poorly specified. Think of this
|
| 149 |
|
|
as generating one such event around the time of finalization, which
|
| 150 |
|
|
is a fully conformant implementation. This implementation is exactly
|
| 151 |
|
|
as useful as that one. </li>
|
| 152 |
|
|
|
| 153 |
|
|
<li> <em>DOMNodeRemovedFromDocument</em> and
|
| 154 |
|
|
<em>DOMNodeInsertedIntoDocument</em> are supposed to get sent to
|
| 155 |
|
|
every node in a subtree that gets removed or inserted (respectively).
|
| 156 |
|
|
This can be <em>extremely costly</em>, and the removal and insertion
|
| 157 |
|
|
processing is already significantly slower due to event reporting.
|
| 158 |
|
|
It's much easier, and more efficient, to have a listener higher in the
|
| 159 |
|
|
tree watch removal and insertion events through the bubbling or capture
|
| 160 |
|
|
mechanisms, than it is to watch for these two events.</li>
|
| 161 |
|
|
|
| 162 |
|
|
</ul>
|
| 163 |
|
|
|
| 164 |
|
|
<p> In addition, certain kinds of attribute modification aren't reported.
|
| 165 |
|
|
A fix is known, but it couldn't report the previous value of the attribute.
|
| 166 |
|
|
More work could fix all of this (as well as reduce the generally high cost
|
| 167 |
|
|
of childful attributes), but that's not been done yet. </p>
|
| 168 |
|
|
|
| 169 |
|
|
<p> Also, note that it is a <em>Bad Thing™</em> to have the listener
|
| 170 |
|
|
for a mutation event change the ancestry for the target of that event.
|
| 171 |
|
|
Or to prevent mutation events from bubbling to where they're needed.
|
| 172 |
|
|
Just don't do those, OK? </p>
|
| 173 |
|
|
|
| 174 |
|
|
<p> As an experimental feature (named "USER-Events"), you can provide
|
| 175 |
|
|
your own "user" events. Just name them anything starting with "USER-"
|
| 176 |
|
|
and you're set. Dispatch them through, bubbling, capturing, or what
|
| 177 |
|
|
ever takes your fancy. One important thing you can't currently do is
|
| 178 |
|
|
pass any data (like an object) with those events. Maybe later there
|
| 179 |
|
|
will be a "UserEvent" interface letting you get some substantial use
|
| 180 |
|
|
out of this mechanism even if you're not "inside" of a DOM package.</p>
|
| 181 |
|
|
|
| 182 |
|
|
<p> You can create and send HTML events. Ditto UIEvents. Since DOM
|
| 183 |
|
|
doesn't require a UI, it's the UI's job to send them; perhaps that's
|
| 184 |
|
|
part of your application. </p>
|
| 185 |
|
|
|
| 186 |
|
|
<p><em>This package may be built without the ability to report mutation
|
| 187 |
|
|
events, gaining a significant speedup in DOM construction time. However,
|
| 188 |
|
|
if that is done then certain other features -- notably node iterators
|
| 189 |
|
|
and getElementsByTagname -- will not be available.</em>
|
| 190 |
|
|
|
| 191 |
|
|
|
| 192 |
|
|
<h3> Optional Feature: "Traversal" </h3>
|
| 193 |
|
|
|
| 194 |
|
|
<p> Each DOM node has all you need to walk to everything connected
|
| 195 |
|
|
to that node. Lightweight, efficient utilities are easily layered on
|
| 196 |
|
|
top of just the core APIs. </p>
|
| 197 |
|
|
|
| 198 |
|
|
<p> Traversal APIs are an optional part of DOM Level 2, providing
|
| 199 |
|
|
a not-so-lightweight way to walk over DOM trees, if your application
|
| 200 |
|
|
didn't already have such utilities for use with data represented via
|
| 201 |
|
|
DOM. Implementing this helped debug the (optional) event and mutation
|
| 202 |
|
|
event subsystems, so it's provided here. </p>
|
| 203 |
|
|
|
| 204 |
|
|
<p> At this writing, the "TreeWalker" interface isn't implemented. </p>
|
| 205 |
|
|
|
| 206 |
|
|
|
| 207 |
|
|
|
| 208 |
|
|
<h2><a name='avoid'>DOM Functionality to Avoid</a></h2>
|
| 209 |
|
|
|
| 210 |
|
|
<p> For what appear to be a combination of historical and "committee
|
| 211 |
|
|
logic" reasons, DOM has a number of <em>features which I strongly advise
|
| 212 |
|
|
you to avoid using</em> in your library and application code. These
|
| 213 |
|
|
include the following types of DOM nodes; see the documentation for the
|
| 214 |
|
|
implementation class for more information: <ul>
|
| 215 |
|
|
|
| 216 |
|
|
<li> CDATASection
|
| 217 |
|
|
(<a href='DomCDATA.html'>DomCDATA</a> class)
|
| 218 |
|
|
... use normal Text nodes instead, so you don't have to make
|
| 219 |
|
|
every algorithm recognize multiple types of character data
|
| 220 |
|
|
|
| 221 |
|
|
<li> DocumentType
|
| 222 |
|
|
(<a href='DomDoctype.html'>DomDocType</a> class)
|
| 223 |
|
|
... if this held actual typing information, it might be useful
|
| 224 |
|
|
|
| 225 |
|
|
<li> Entity
|
| 226 |
|
|
(<a href='DomEntity.html'>DomEntity</a> class)
|
| 227 |
|
|
... neither parsed nor unparsed entities work well in DOM; it
|
| 228 |
|
|
won't even tell you which attributes identify unparsed entities
|
| 229 |
|
|
|
| 230 |
|
|
<li> EntityReference
|
| 231 |
|
|
(<a href='DomEntityReference.html'>DomEntityReference</a> class)
|
| 232 |
|
|
... permitted implementation variances are extreme, all children
|
| 233 |
|
|
are readonly, and these can interact poorly with namespaces
|
| 234 |
|
|
|
| 235 |
|
|
<li> Notation
|
| 236 |
|
|
(<a href='DomNotation.html'>DomNotation</a> class)
|
| 237 |
|
|
... only really usable with unparsed entities (which aren't well
|
| 238 |
|
|
supported; see above) or perhaps with PIs after the DTD, not with
|
| 239 |
|
|
NOTATION attributes
|
| 240 |
|
|
|
| 241 |
|
|
</ul>
|
| 242 |
|
|
|
| 243 |
|
|
<p> If you really need to use unparsed entities or notations, use SAX;
|
| 244 |
|
|
it offers better support for all DTD-related functionality.
|
| 245 |
|
|
It also exposes actual
|
| 246 |
|
|
document typing information (such as element content models).</p>
|
| 247 |
|
|
|
| 248 |
|
|
<p> Also, when accessing attribute values, use methods that provide their
|
| 249 |
|
|
values as single strings, rather than those which expose value substructure
|
| 250 |
|
|
(Text and EntityReference nodes). (See the <a href='DomAttr.html'>DomAttr</a>
|
| 251 |
|
|
documentation for more information.) </p>
|
| 252 |
|
|
|
| 253 |
|
|
<p> Note that many of these features were provided as partial support for
|
| 254 |
|
|
editor functionality (including the incomplete DTD access). Full editor
|
| 255 |
|
|
functionality requires access to potentially malformed lexical structure,
|
| 256 |
|
|
at the level of unparsed tokens and below. Access at such levels is so
|
| 257 |
|
|
complex that using it in non-editor applications sacrifices all the
|
| 258 |
|
|
benefits of XML; editor aplications need extremely specialized APIs. </p>
|
| 259 |
|
|
|
| 260 |
|
|
<p> (This isn't a slam against DTDs, note; only against the broken support
|
| 261 |
|
|
for them in DOM. Even despite inclusion of some dubious SGML legacy features
|
| 262 |
|
|
such as notations and unparsed entities,
|
| 263 |
|
|
and the ongoing proliferation of alternative schema and validation tools,
|
| 264 |
|
|
DTDs are still the most widely adopted tool
|
| 265 |
|
|
to constrain XML document structure.
|
| 266 |
|
|
Alternative schemes generally focus on data transfer style
|
| 267 |
|
|
applications; open document architectures comparable to
|
| 268 |
|
|
DocBook 4.0 don't yet exist in the schema world.
|
| 269 |
|
|
Feel free to use DTDs; just don't expect DOM to help you.) </p>
|
| 270 |
|
|
|
| 271 |
|
|
</body>
|
| 272 |
|
|
</html>
|
| 273 |
|
|
|