1 |
769 |
jeremybenn |
<html><head><title>
|
2 |
|
|
blah
|
3 |
|
|
<!--
|
4 |
|
|
/*
|
5 |
|
|
* Copyright (C) 1999-2001 The Free Software Foundation, Inc.
|
6 |
|
|
*/
|
7 |
|
|
-->
|
8 |
|
|
</title></head><body>
|
9 |
|
|
|
10 |
|
|
<p>This package exposes a kind of XML processing pipeline, based on sending
|
11 |
|
|
SAX events, which can be used as components of application architectures.
|
12 |
|
|
Pipelines are used to convey streams of processing events from a producer
|
13 |
|
|
to one or more consumers, and to let each consumer control the data seen by
|
14 |
|
|
later consumers.
|
15 |
|
|
|
16 |
|
|
<p> There is a <a href="PipelineFactory.html">PipelineFactory</a> class which
|
17 |
|
|
accepts a syntax describing how to construct some simple pipelines. Strings
|
18 |
|
|
describing such pipelines can be used in command line tools (see the
|
19 |
|
|
<a href="../util/DoParse.html">DoParse</a> class)
|
20 |
|
|
and in other places that it is
|
21 |
|
|
useful to let processing be easily reconfigured. Pipelines can of course
|
22 |
|
|
be constructed programmatically, providing access to options that the
|
23 |
|
|
factory won't.
|
24 |
|
|
|
25 |
|
|
<p> Web applications are supported by making it easy for servlets (or
|
26 |
|
|
non-Java web application components) to be part of a pipeline. They can
|
27 |
|
|
originate XML (or XHTML) data through an <em>InputSource</em> or in
|
28 |
|
|
response to XML messages sent from clients using <em>CallFilter</em>
|
29 |
|
|
pipeline stages. Such facilities are available using the simple syntax
|
30 |
|
|
for pipeline construction.
|
31 |
|
|
|
32 |
|
|
|
33 |
|
|
<h2> Programming Models </h2>
|
34 |
|
|
|
35 |
|
|
<p> Pipelines should be simple to understand.
|
36 |
|
|
|
37 |
|
|
<ul>
|
38 |
|
|
<li> XML content, typically entire documents,
|
39 |
|
|
is pushed through consumers by producers.
|
40 |
|
|
|
41 |
|
|
<li> Pipelines are basically about consuming SAX2 callback events,
|
42 |
|
|
where the events encapsulate XML infoset-level data.<ul>
|
43 |
|
|
|
44 |
|
|
<li> Pipelines are constructed by taking one or more consumer
|
45 |
|
|
stages and combining them to produce a composite consumer.
|
46 |
|
|
|
47 |
|
|
<li> A pipeline is presumed to have pending tasks and state from
|
48 |
|
|
the beginning of its ContentHandler.startDocument() callback until
|
49 |
|
|
it's returned from its ContentHandler.doneDocument() callback.
|
50 |
|
|
|
51 |
|
|
<li> Pipelines may have multiple output stages ("fan-out")
|
52 |
|
|
or multiple input stages ("fan-in") when appropriate.
|
53 |
|
|
|
54 |
|
|
<li> Pipelines may be long-lived, but need not be.
|
55 |
|
|
|
56 |
|
|
</ul>
|
57 |
|
|
|
58 |
|
|
<li> There is flexibility about event production. <ul>
|
59 |
|
|
|
60 |
|
|
<li> SAX2 XMLReader objects are producers, which
|
61 |
|
|
provide a high level "pull" model: documents (text or DOM) are parsed,
|
62 |
|
|
and the parser pushes individual events through the pipeline.
|
63 |
|
|
|
64 |
|
|
<li> Events can be pushed directly to event consumer components
|
65 |
|
|
by application modules, if they invoke SAX2 callbacks directly.
|
66 |
|
|
That is, application modules use the XML Infoset as exposed
|
67 |
|
|
through SAX2 event callbacks.
|
68 |
|
|
|
69 |
|
|
</ul>
|
70 |
|
|
|
71 |
|
|
<li> Multiple producer threads may concurrently access a pipeline,
|
72 |
|
|
if they coordinate appropriately.
|
73 |
|
|
|
74 |
|
|
<li> Pipeline processing is not the only framework applications
|
75 |
|
|
will use.
|
76 |
|
|
|
77 |
|
|
</ul>
|
78 |
|
|
|
79 |
|
|
|
80 |
|
|
<h3> Producers: XMLReader or Custom </h3>
|
81 |
|
|
|
82 |
|
|
<p> Many producers will be SAX2 XMLReader objects, and
|
83 |
|
|
will read (pull) data which is then written (pushed) as events.
|
84 |
|
|
Typically these will parse XML text (acquired from
|
85 |
|
|
<code>org.xml.sax.helpers.XMLReaderFactory</code>) or a DOM tree
|
86 |
|
|
(using a <code><a href="../util/DomParser.html">DomParser</a></code>)
|
87 |
|
|
These may be bound to event consumer using a convenience routine,
|
88 |
|
|
<em><a href="EventFilter.html">EventFilter</a>.bind()</em>.
|
89 |
|
|
Once bound, these producers may be given additional documents to
|
90 |
|
|
sent through its pipeline.
|
91 |
|
|
|
92 |
|
|
<p> In other cases, you will write producers yourself. For example, some
|
93 |
|
|
data structures might know how to write themselves out using one or
|
94 |
|
|
more XML models, expressed as sequences of SAX2 event callbacks.
|
95 |
|
|
An application module might
|
96 |
|
|
itself be a producer, issuing startDocument and endDocument events
|
97 |
|
|
and then asking those data structures to write themselves out to a
|
98 |
|
|
given EventConsumer, or walking data structures (such as JDBC query
|
99 |
|
|
results) and applying its own conversion rules. WAP format XML
|
100 |
|
|
(WBMXL) can be directly converted to producer output.
|
101 |
|
|
|
102 |
|
|
<p> SAX2 introduced an "XMLFilter" interface, which is a kind of XMLReader.
|
103 |
|
|
It is most useful in conjunction with its XMLFilterImpl helper class;
|
104 |
|
|
see the <em><a href="EventFilter.html">EventFilter</a></em> javadoc
|
105 |
|
|
for information contrasting that XMLFilterImpl approach with the
|
106 |
|
|
relevant parts of this pipeline framework. Briefly, such XMLFilterImpl
|
107 |
|
|
children can be either producers or consumers, and are more limited in
|
108 |
|
|
configuration flexibility. In this framework, the focus of filters is
|
109 |
|
|
on the EventConsumer side; see the section on
|
110 |
|
|
<a href="#fitting">pipe fitting</a> below.
|
111 |
|
|
|
112 |
|
|
|
113 |
|
|
<h3> Consume to Standard or Custom Data Representations </h3>
|
114 |
|
|
|
115 |
|
|
<p> Many consumers will be used to create standard representations of XML
|
116 |
|
|
data. The <a href="TextConsumer.html">TextConsumer</a> takes its events
|
117 |
|
|
and writes them as text for a single XML document,
|
118 |
|
|
using an internal <a href="../util/XMLWriter.html">XMLWriter</a>.
|
119 |
|
|
The <a href="DomConsumer.html">DomConsumer</a> takes its events and uses
|
120 |
|
|
them to create and populate a DOM Document.
|
121 |
|
|
|
122 |
|
|
<p> In other cases, you will write consumers yourself. For example,
|
123 |
|
|
you might use a particular unmarshaling filter to produce objects
|
124 |
|
|
that fit your application's requirements, instead of using DOM.
|
125 |
|
|
Such consumers work at the level of XML data models, rather than with
|
126 |
|
|
specific representations such as XML text or a DOM tree. You could
|
127 |
|
|
convert your output directly to WAP format data (WBXML).
|
128 |
|
|
|
129 |
|
|
|
130 |
|
|
<h3><a name="fitting">Pipe Fitting</a></h3>
|
131 |
|
|
|
132 |
|
|
<p> Pipelines are composite event consumers, with each stage having
|
133 |
|
|
the opportunity to transform the data before delivering it to any
|
134 |
|
|
subsequent stages.
|
135 |
|
|
|
136 |
|
|
<p> The <a href="PipelineFactory.html">PipelineFactory</a> class
|
137 |
|
|
provides access to much of this functionality through a simple syntax.
|
138 |
|
|
See the table in that class's javadoc describing a number of standard
|
139 |
|
|
components. Direct API calls are still needed for many of the most
|
140 |
|
|
interesting pipeline configurations, including ones leveraging actual
|
141 |
|
|
or logical concurrency.
|
142 |
|
|
|
143 |
|
|
<p> Four basic types of pipe fitting are directly supported. These may
|
144 |
|
|
be used to construct complex pipeline networks. <ul>
|
145 |
|
|
|
146 |
|
|
<li> <a href="TeeConsumer.html">TeeConsumer</a> objects split event
|
147 |
|
|
flow so it goes to two two different consumers, one before the other.
|
148 |
|
|
This is a basic form of event fan-out; you can use this class to
|
149 |
|
|
copy events to any number of output pipelines.
|
150 |
|
|
|
151 |
|
|
<li> Clients can call remote components through HTTP or HTTPS using
|
152 |
|
|
the <a href="CallFilter.html">CallFilter</a> component, and Servlets
|
153 |
|
|
can implement such components by extending the
|
154 |
|
|
<a href="XmlServlet.html">XmlServlet</a> component. Java is not
|
155 |
|
|
required on either end, and transport protocols other than HTTP may
|
156 |
|
|
also be used.
|
157 |
|
|
|
158 |
|
|
<li> <a href="EventFilter.html">EventFilter</a> objects selectively
|
159 |
|
|
provide handling for callbacks, and can pass unhandled ones to a
|
160 |
|
|
subsequent stage. They are often subclassed, since much of the
|
161 |
|
|
basic filtering machinery is already in place in the base class.
|
162 |
|
|
|
163 |
|
|
<li> Applications can merge two event flows by just using the same
|
164 |
|
|
consumer in each one. If multiple threads are in use, synchronization
|
165 |
|
|
needs to be addressed by the appropriate application level policy.
|
166 |
|
|
|
167 |
|
|
</ul>
|
168 |
|
|
|
169 |
|
|
<p> Note that filters can be as complex as
|
170 |
|
|
<a href="XsltFilter.html">XSLT transforms</a>
|
171 |
|
|
available) on input data, or as simple as removing simple syntax data
|
172 |
|
|
such as ignorable whitespace, comments, and CDATA delimiters.
|
173 |
|
|
Some simple "built-in" filters are part of this package.
|
174 |
|
|
|
175 |
|
|
|
176 |
|
|
<h3> Coding Conventions: Filter and Terminus Stages</h3>
|
177 |
|
|
|
178 |
|
|
<p> If you follow these coding conventions, your classes may be used
|
179 |
|
|
directly (give the full class name) in pipeline descriptions as understood
|
180 |
|
|
by the PipelineFactory. There are four constructors the factory may
|
181 |
|
|
try to use; in order of decreasing numbers of parameters, these are: <ul>
|
182 |
|
|
|
183 |
|
|
<li> Filters that need a single String setup parameter should have
|
184 |
|
|
a public constructor with two parameters: that string, then the
|
185 |
|
|
EventConsumer holding the "next" consumer to get events.
|
186 |
|
|
|
187 |
|
|
<li> Filters that don't need setup parameters should have a public
|
188 |
|
|
constructor that accepts a single EventConsumer holding the "next"
|
189 |
|
|
consumer to get events when they are done.
|
190 |
|
|
|
191 |
|
|
<li> Terminus stages may have a public constructor taking a single
|
192 |
|
|
paramter: the string value of that parameter.
|
193 |
|
|
|
194 |
|
|
<li> Terminus stages may have a public no-parameters constructor.
|
195 |
|
|
|
196 |
|
|
</ul>
|
197 |
|
|
|
198 |
|
|
<p> Of course, classes may support more than one such usage convention;
|
199 |
|
|
if they do, they can automatically be used in multiple modes. If you
|
200 |
|
|
try to use a terminus class as a filter, and that terminus has a constructor
|
201 |
|
|
with the appropriate number of arguments, it is automatically wrapped in
|
202 |
|
|
a "tee" filter.
|
203 |
|
|
|
204 |
|
|
|
205 |
|
|
<h2> Debugging Tip: "Tee" Joints can Snapshot Data</h2>
|
206 |
|
|
|
207 |
|
|
<p> It can sometimes be hard to see what's happening, when something
|
208 |
|
|
goes wrong. Easily fixed: just snapshot the data. Then you can find
|
209 |
|
|
out where things start to go wrong.
|
210 |
|
|
|
211 |
|
|
<p> If you're using pipeline descriptors so that they're easily
|
212 |
|
|
administered, just stick a <em>write ( filename )</em>
|
213 |
|
|
filter into the pipeline at an appropriate point.
|
214 |
|
|
|
215 |
|
|
<p> Inside your programs, you can do the same thing directly: perhaps
|
216 |
|
|
by saving a Writer (perhaps a StringWriter) in a variable, using that
|
217 |
|
|
to create a TextConsumer, and making that the first part of a tee --
|
218 |
|
|
splicing that into your pipeline at a convenient location.
|
219 |
|
|
|
220 |
|
|
<p> You can also use a DomConsumer to buffer the data, but remember
|
221 |
|
|
that DOM doesn't save all the information that XML provides, so that DOM
|
222 |
|
|
snapshots are relatively low fidelity. They also are substantially more
|
223 |
|
|
expensive in terms of memory than a StringWriter holding similar data.
|
224 |
|
|
|
225 |
|
|
<h2> Debugging Tip: Non-XML Producers</h2>
|
226 |
|
|
|
227 |
|
|
<p> Producers in pipelines don't need to start from XML
|
228 |
|
|
data structures, such as text in XML syntax (likely coming
|
229 |
|
|
from some <em>XMLReader</em> that parses XML) or a
|
230 |
|
|
DOM representation (perhaps with a
|
231 |
|
|
<a href="../util/DomParser.html">DomParser</a>).
|
232 |
|
|
|
233 |
|
|
<p> One common type of event producer will instead make
|
234 |
|
|
direct calls to SAX event handlers returned from an
|
235 |
|
|
<a href="EventConsumer.html">EventConsumer</a>.
|
236 |
|
|
For example, making <em>ContentHandler.startElement</em>
|
237 |
|
|
calls and matching <em>ContentHandler.endElement</em> calls.
|
238 |
|
|
|
239 |
|
|
<p> Applications making such calls can catch certain
|
240 |
|
|
common "syntax errors" by using a
|
241 |
|
|
<a href="WellFormednessFilter.html">WellFormednessFilter</a>.
|
242 |
|
|
That filter will detect (and report) erroneous input data
|
243 |
|
|
such as mismatched document, element, or CDATA start/end calls.
|
244 |
|
|
Use such a filter near the head of the pipeline that your
|
245 |
|
|
producer feeds, at least while debugging, to help ensure that
|
246 |
|
|
you're providing legal XML Infoset data.
|
247 |
|
|
|
248 |
|
|
<p> You can also arrange to validate data on the fly.
|
249 |
|
|
For DTD validation, you can configure a
|
250 |
|
|
<a href="ValidationConsumer.html">ValidationConsumer</a>
|
251 |
|
|
to work as a filter, using any DTD you choose.
|
252 |
|
|
Other validation schemes can be handled with other
|
253 |
|
|
validation filters.
|
254 |
|
|
|
255 |
|
|
</body></html>
|