Home > SOA Tips > The Web Services Advisor > Using XML pipelines - part 1
SOA Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

THE WEB SERVICES ADVISOR

Using XML pipelines - part 1


William Brogden
09.06.2006
Rating: -5.00- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


Mechanical assembly lines achieve efficiency by moving product through a series of fixed machines, each one specialized to do one function very efficiently. This familiar image has inspired software designers to attempt something similar. The first example I can think of is the Unix toolkit for pipeline processing of text.

At a higher level of abstraction, application architects use concepts such "workflow" or "dataflow" to describe the movement of information as documents or messages through a set of processes. These days more and more documents and messages are formatted in XML, so why not an XML pipeline?

Why another XML tool?

Lets face it, XML may provide a lot of great data manipulation functionality, but speed and low memory use are not at the top of the list. The general XML manipulation tools such as XSLT, XPath and XQuery are flexible, but the additional layers of code are anything but fast.

XML pipeline components can be designed to do one thing and do it well. I think there is reason to believe that we could evolve a set of pipeline components that could be configured and plugged together to accomplish processing tasks rapidly and with minimum resource use. Later in this article and in part 2 I will review some of the attempts that have been made, but for right now lets look as some basics.

How to feed a pipeline

There are three different ways to move XML data in a pipeline, as characters either in a stream or as Strings, as SAX events or as Document Object Model (DOM) elements. I did some time trials by reading a 8.9meg XML document three different ways using Java 1.5 standard library classes. Here are the results normalized to the plain stream reading time: as a stream - 1.0; as a stream turned into SAX events - 2.0; and as a stream turned into a DOM - 9.2.

As far a memory usage is concerned, a DOM in memory takes much more space than just the characters in the file because of all the objects created. SAX events on the other hand are quite small. I think SAX pipelines offer many advantages in addition to speed and small memory footprint for processing large XML documents.

A pipeline for SAX events

SAX stands for Simplified API for XML Processing. A SAX parser recognizes the various parts of an XML document, creates objects incorporating the data and passes the objects to "handler" methods which have been registered with the parser. The key point which opens the floodgates of possibilities is that the stream of SAX events generated by an SAX parser contains the complete infoset of the original XML document, one piece at a time. Any process that can work with a single event can do some work and then pass the event on to the next process.

Here are the signatures of the Java methods that handle the data for XML start and end element tags in the org.xml.sax.ContentHander interface. In a pipeline, you would use code in these methods to examine and possibly modify these parameters before passing them to the next handler in the pipeline.

public void startElement(String uri, String localName, String qName, Attributes atts)

public void endElement(String uri, String localName, String qName)

The content of these parameters is as follows:

uri: If the document uses XML namespaces AND the parser has namespace processing turned on, this String will contain the URI - otherwise it will be an empty String.

localName: If the parser has namespace processing turned on, this String will contain the element name minus any namespace prefix - otherwise it will be an empty String.

qName: This String will have the complete name of the element, with prefix if any.

atts: This is a reference to an Attributes object containing the names and values of all attributes in the element start tag.

For example, given a SOAP message that starts like this:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
     xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>

The data passed to the startElement method when the parser encountered the <soap:Body> tag would be: uri = "http://schemas.xmlsoap.org/soap/envelope/", localName = "Body" and qName = "soap:Body". The Attributes object would exist but it would have a count of zero attributes.

Building and using SAX pipelines

Understanding the data flow in a practical SAX pipeline is a little hard to explain without diagrams. Fortunately, chapter 8, "SAX Filters," of Elliotte Rusty Harold's excellent book "Processing XML with Java" has numerous examples of connecting SAX handling components with diagrams showing the data flow. This chapter has been made available on line as shown in the references below. These examples use the standard JAXP (Java API for XML Processing) library classes so if you have Java 1.4 or 1.5 installed, you don't need anything else.

Applications of SAX pipeline components

Here are some of the tasks a single component can accomplish. Some of these I have coded myself, others come from published examples.

Extracting Statistics - using the startElement method, a component can keep a count of various elements and the frequency of various attribute values.

Removing Elements - a component can selectively remove specified elements so that one master XML document can serve many purposes.

Adding elements or attributes based on computation - for example you could do a database query to look up a part number and add a part description to a purchase order.

Why not use XSLT Instead?

While it is true that XSLT could in theory be used to perform these tasks, it is slower and more memory intensive. XSLT shines in many areas, especially when a major rearrangement of the data is required. An SAX pipeline is going to be much faster if the problem is suited to sequential processing of elements without a major rearrangement.

In my next article I will cover the W3C XML-Pipeline specification and some example toolkits based on pipeline principles.

References

Online version of chapter 8, "SAX Filters," in Elliotte Rusty Harold's book"

The Java API for XML Processing (JAXP)


Rate this Tip
To rate tips, you must be a member of SearchSOA.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
The Web Services Advisor
XMPP-Extensible Messaging and Presence Protocol: SOAP and REST get closer company
Find your server in the Cloud
Erlang and concurrency in service-orientated architectures
Backup to the compute cloud
SPARQL shines as RDF's query language
Testing Web services: Unit testing and monitoring
Web services for Windows CE
Testing Web services and RIAs
The problem with IT project management
Web services with Open and Microsoft Office

XML and XML schema
Lock-in, security loom as dark side of Compute Cloud
Ganymede: Modeling tools target SOA, UML
Data services mashups emerge for SOA
Making sense of data services mashups
XML turns 10
SOA helps save 100-year-old business
Oracle maps heterogeneous data services strategy for SOA
Handling XML with Ajax
Efficient XML draft published
Intel boosts XML for SOA

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
class diagram  (SearchSOA.com)
Fast Infoset (FI)  (SearchSOA.com)
GeoRSS  (SearchSOA.com)
Keyhole Markup Language  (SearchSOA.com)
RELAX NG  (SearchSOA.com)
state diagram  (SearchSOA.com)
Universal Business Language  (SearchSOA.com)
Vector Markup Language  (SearchSOA.com)
XML infoset  (SearchSOA.com)
XML pipeline  (SearchSOA.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2001 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts